Skip to Main Content

Course Materials - Spring 2025

 

CV806:Advanced topics in Vision and Language

 TB = Textbook or Required reading                     REF = Reference or supplemental reading

Type ........................ Title 

eBook

Print  

Call Number
R. Szeliski, Computer Vision: Algorithms and Applications, 2nd ed., Springer Verlag, 2022. Springer  Yes TA1634 .S97 2022
T.L. Berg, A.C. Berg, J. Edwards, M. Maire, R. White, Yee-Whye Teh, E. Learned-Miller, D.A. Forsyth, “Names and faces in the news,” in Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition, Nov. 2004, pp. 848–854. doi: https://doi.org/10.1109/cvpr.2004.1315253.
 
NA    
P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic. "Finding actors and actions in movies." In Proceedings of the IEEE international conference on computer vision, pp. 2280-2287, 2013.  NA  
A. Miech, J-B. Alayrac, L. Smaira, I. Laptev, J. Sivic, A. Zisserman. "End-to-end learning of visual representations from uncurated instructional videos," In Proc CVPR 2020. 
 
Open Access
NA  

Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G. Learning transferable visual models from natural language supervision. In Proc ICML 2021.  Open Access NA  

Li J, Li D, Xiong C, Hoi S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proc ICML 2022. 

ProQuest
 
Yes  QA641 .C33 2016
Miech A, Alayrac JB, Laptev I, Sivic J, Zisserman A. Thinking fast and slow: Efficient text-to-visual retrieval with transformers. In Proc CVPR 2021. 

Open Access

 

NA  
Kamath A, Singh M, LeCun Y, Synnaeve G, Misra I, Carion N. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proc ICCV 2021.  Open Access NA  
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, Dollár P. Segment anything. In Proc ICCV 2023.  Open Access NA  
Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D. VQA: Visual question answering. In Proc ICCV 2015.  Open Access NA  
Yang A, Miech A, Sivic J, Laptev I, Schmid C. Zero-shot video question answering via frozen bidirectional language models. In Proc NeurIPS 2022.  Open Access NA  
Alayrac JB, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M, Ring R. Flamingo: a visual language model for few-shot learning. In Proc NeurIPS 2022.  Open Access NA  
Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van Den Hengel A. Vision-and-language navigation: Interpreting visually grounded navigation instructions in real environments. In Proc CVPR 2018.  Open Access NA  
Chen S, Guhur PL, Schmid C, Laptev I. History aware multimodal transformer for vision-and-language navigation. In Proc NeurIPS 2021.  Open Access NA  
Guhur PL, Chen S, Pinel RG, Tapaswi M, Laptev I, Schmid C. Instruction-driven history-aware policies for robotic manipulations. In Proc CoRL 2022.  Open Access NA