Dongyoon Han

VisualScratchpad: Grounding Visual Concepts in Large Vision Language Models featured image

VisualScratchpad: Grounding Visual Concepts in Large Vision Language Models

Grounding visual concepts in large vision-language models via a attention-based linking mechanism.

hyesu-lim
Towards Calibrated Robust Fine-Tuning of Vision-Language Models featured image

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Calibrated, robust fine-tuning method for vision-language models that preserves uncertainty estimates under distribution shift.

changdae-oh