Code and Datasets

Current (as of 06/2020)

see below for older code an project pages

Vision and Language

Multimodal:

Images with Text

VQA

  • In Defense of Grid Features for Visual Question Answering [pdf] [code] [CVPR presentation]
  • See also TextVQA and Multimodal above

Video Description

  • ActivityNet Entities

Long-tail

Continual Learning

Older 

Visual Question Answering

Grounding

  • Grounding of referential expressions
    • with bounding boxes: Code
    • with segmenations: Code

Image and video description

Visual Knowledge Transfer with linguistic knowledge

Activity Recognition

older versions:

Depth, Multi-view, Human Pose