Projects and Datasets

Visual Question Answering

Grounding

  • Grounding of referential expressions
    • with bounding boxes: Code
    • with segmetnations: Code

Image and video description

Visual Knowledge Transfer with linguistic knowledge

Activity Recognition

older versions:

Depth, Multi-view, Human Pose