DESIGNING AND EVALUATING A DUAL-STREAM TRANSFORMER-BASED ARCHITECTURE FOR VISUAL QUESTION ANSWERING