Workshop

VisualScratchpad: Grounding Visual Concepts in Large Vision Language Models featured image

VisualScratchpad: Grounding Visual Concepts in Large Vision Language Models

Grounding visual concepts in large vision-language models via a attention-based linking mechanism.

hyesu-lim