- 
                                    
                                        AToken: A Unified Tokenizer for Vision
                                        Jiasen Lu*, Liangchen Song*, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
                                        Technical Report, 2025
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
                                        Rui Tian*, Mingfei Gao*, Mingze Xu*, Jiaming Hu, Jiasen Lu, Zuxuan Wu, Yinfei Yang, Afshin Dehghan
                                        Conference on Neural Information Processing Systems (NeurIPS), 2025
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
                                        Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang
                                        Conference on Neural Information Processing Systems (NeurIPS), 2025
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
                                        Mingze Xu*, Mingfei Gao*, Shiyu Li*, Jiasen Lu, Zhe Gan, Zhengfeng Lai, Meng Cao, Kai Kang, Yinfei Yang, Afshin Dehghan
                                        Conference on Language Modeling (COLM), 2025
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
                                        Apple MM1.5 Team
                                        International Conference on Learning Representations (ICLR), 2025
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
                                        Mingze Xu*, Mingfei Gao*, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan
                                        Technical Report, 2024
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        SkeleTR: Towards Skeleton-based Action Recognition in the Wild
                                        Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
                                        IEEE International Conference on Computer Vision (ICCV), 2023
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        An In-depth Study of Stochastic Backpropagation
                                        Jun Fang, Mingze Xu†, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe
                                        Conference on Neural Information Processing Systems (NeurIPS), 2022
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        MeMOT: Multi-Object Tracking with Memory
                                        Jiarui Cai, Mingze Xu†, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
                                        IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
                                        Feng Cheng, Mingze Xu†, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia
                                        IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        TubeR: Tubelet Transformer for Video Action Detection
                                        Jiaojiao Zhao*, Yanyi Zhang*, Xinyu Li*, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees Snoek, Joseph Tighe
                                        IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        DoTA: Unsupervised Detection of Traffic Anomaly in Driving Videos
                                        Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Yuchen Wang, Ella Atkins, David Crandall
                                        IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2022
                                        
                                            [PDF]
                                            [Data]
                                        
                                    
                                 
                                - 
                                    
                                        Stepwise Goal-Driven Networks for Trajectory Prediction
                                        Chuhua Wang*, Yuchen Wang*, Mingze Xu, David Crandall
                                        IEEE Robotics and Automation Letters (RA-L), 2022
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        Long Short-Term Transformer for Online Action Detection
                                        Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto
                                        Conference on Neural Information Processing Systems (NeurIPS), 2021 (Spotlight)
                                        
                                            [PDF]
                                            [Project page]
                                        
                                    
                                 
                                - 
                                    
                                        Learning Self-Consistency for Deepfake Detection
                                        Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, Wei Xia
                                        IEEE International Conference on Computer Vision (ICCV), 2021 (Oral)
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        Temporal Recurrent Networks for Online Action Detection
                                        Mingze Xu*, Mingfei Gao*, Yi-Ting Chen, Larry Davis, David Crandall
                                        IEEE International Conference on Computer Vision (ICCV), 2019
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        StartNet: Online Detection of Action Start in Untrimmed Videos
                                        Mingfei Gao, Mingze Xu, Larry Davis, Richard Socher, Caiming Xiong
                                        IEEE International Conference on Computer Vision (ICCV), 2019
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        Embodied Amodal Recognition: Learning to Move to Perceive Objects
                                        Jianwei Yang*, Zhile Ren*, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra
                                        IEEE International Conference on Computer Vision (ICCV), 2019
                                        
                                            [PDF]
                                        
                                    
                                 
                                - 
                                    
                                        Unsupervised Traffic Accident Detection in First-Person Videos
                                        Mingze Xu*, Yu Yao*, Yuchen Wang, David Crandall, Ella Atkins
                                        IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems
                                        Yu Yao, Mingze Xu, Chiho Choi, David Crandall, Ella Atkins, Behzad Dariush
                                        IEEE International Conference on Robotics and Automation (ICRA), 2019
                                        
                                            [PDF]
                                            [Code]
                                        
                                    
                                 
                                - 
                                    
                                        Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos
                                        Mingze Xu, Chenyou Fan, Yuchen Wang, Michael Ryoo, David Crandall
                                        European Conference on Computer Vision (ECCV), 2018
                                        
                                            [PDF]
                                            [Project page]