CV806:Advanced topics in Vision and Language
TB = Textbook or Required reading
REF = Reference or supplemental reading
Type | ........................ | Title |
eBook |
|
Call Number | |
![]() |
![]() |
R. Szeliski, Computer Vision: Algorithms and Applications, 2nd ed., Springer Verlag, 2022. | Springer | Yes | TA1634 .S97 2022 | |
![]() |
![]() |
T.L. Berg, A.C. Berg, J. Edwards, M. Maire, R. White, Yee-Whye Teh, E. Learned-Miller, D.A. Forsyth, “Names and faces in the news,” in Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition, Nov. 2004, pp. 848–854. doi: https://doi.org/10.1109/cvpr.2004.1315253. |
|
NA | ||
![]() |
![]() |
P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic. "Finding actors and actions in movies." In Proceedings of the IEEE international conference on computer vision, pp. 2280-2287, 2013. | NA | |||
![]() |
![]() |
A. Miech, J-B. Alayrac, L. Smaira, I. Laptev, J. Sivic, A. Zisserman. "End-to-end learning of visual representations from uncurated instructional videos," In Proc CVPR 2020. |
|
NA | ||
![]() |
![]() |
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G. Learning transferable visual models from natural language supervision. In Proc ICML 2021. | Open Access | NA | ||
![]() |
![]() |
Li J, Li D, Xiong C, Hoi S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proc ICML 2022.
|
ProQuest |
Yes | QA641 .C33 2016 | |
![]() |
![]() |
Miech A, Alayrac JB, Laptev I, Sivic J, Zisserman A. Thinking fast and slow: Efficient text-to-visual retrieval with transformers. In Proc CVPR 2021. |
|
NA | ||
![]() |
![]() |
Kamath A, Singh M, LeCun Y, Synnaeve G, Misra I, Carion N. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proc ICCV 2021. | Open Access | NA | ||
![]() |
![]() |
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, Dollár P. Segment anything. In Proc ICCV 2023. | Open Access | NA | ||
![]() |
![]() |
Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D. VQA: Visual question answering. In Proc ICCV 2015. | Open Access | NA | ||
![]() |
![]() |
Yang A, Miech A, Sivic J, Laptev I, Schmid C. Zero-shot video question answering via frozen bidirectional language models. In Proc NeurIPS 2022. | Open Access | NA | ||
![]() |
![]() |
Alayrac JB, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M, Ring R. Flamingo: a visual language model for few-shot learning. In Proc NeurIPS 2022. | Open Access | NA | ||
![]() |
![]() |
Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van Den Hengel A. Vision-and-language navigation: Interpreting visually grounded navigation instructions in real environments. In Proc CVPR 2018. | Open Access | NA | ||
![]() |
![]() |
Chen S, Guhur PL, Schmid C, Laptev I. History aware multimodal transformer for vision-and-language navigation. In Proc NeurIPS 2021. | Open Access | NA | ||
![]() |
![]() |
Guhur PL, Chen S, Pinel RG, Tapaswi M, Laptev I, Schmid C. Instruction-driven history-aware policies for robotic manipulations. In Proc CoRL 2022. | Open Access | NA |