SLoMO - ICCV 2025 Workshop

Overview

SLoMO aims to bring together researchers working on the understanding of long-form, edited videos—such as movies and TV episodes. We focus on two key aspects:

Audio Description (AD) Generation focuses on producing concise, coherent, story-driven narrations for blind and visually impaired (BVI) audiences, complementary to the information provided by the original audio.
We host a series of invited talks addressing key open questions:
- How can story-level information be effectively perceived and utilized in downstream tasks?
- How can fair evaluation be ensured-e.g., for AD-and how can data leakage into large-scale pre-trained models be minimized?
- What are the current limitations of Audio Descriptions, and what are the next steps toward practical automatic AD generation?
Movie Question Answering evaluates a model's ability to comprehend narratives, particularly through story-level reasoning and long-context modeling.

To advance research in this direction, we present the Short-Films 20K (SF20K) Competition for story-level movie understanding.

Schedule

The workshop will be held on the morning of 19th October, 2025, at the Honolulu Convention Center.

Time	Session	Speaker / Details
09:20 - 09:30	Opening remarks
09:30 - 10:00	Invited Talk 1	Prof. Amy Pavel (UT Austin)
10:00 - 10:30	Invited Talk 2	Prof. Anna Rohrbach (TU Darmstadt)
10:30 - 11:20	SF20K Competition	Result announcements & presentations
11:20 - 11:40	Coffee break
11:40 - 12:10	Invited Talk 3	Prof. Mike Zheng Shou (NUS)
12:10 - 12:40	Invited Talk 4	Prof. Makarand Tapaswi (IIIT Hyderabad)
12:40 - 13:00	Closing remarks

Invited Speakers

Amy Pavel

University of Texas at Austin
Anna Rohrbach

TU Darmstadt · hessian.AI
Mike Zheng Shou

National University of Singapore
Makarand Tapaswi

IIIT Hyderabad · Wadhwani AI

Short-Films 20K (SF20K) Competition

The Short-Films 20K (SF20K) Competition aims to advance story-level video understanding by leveraging the new SF20K dataset. While recent multimodal models have demonstrated progress in video understanding, existing benchmarks are largely limited to short videos with simple narratives. In contrast, our competition focuses on complex, long-term reasoning in storytelling by introducing multiple-choice and open-ended question answering tasks.
The competition is based on SF20K-Test-Expert, a subset of the SF20K dataset, which includes manually crafted open-ended questions. The questions are designed to be challenging, requiring long-term reasoning and multimodal understanding of the video content.

Competition Format

This edition features two tracks focused on Open-Ended Video Question Answering:

Main track - Unlimited model size
This track allows for models of any size, encouraging participants to push the limits of performance in video understanding. Participate via the SF20K Competition platform.
Special track - Constrained model size (<8B)
This track challenges participants to create efficient models under 8 billion parameters, focusing on innovative solutions. Top scores will be required to provide reproducible code. Participate via the [CONSTRAINED MODEL SIZE] SF20K Competition platform.

Competition Phases

The competition is divided into two phases:

Phase 1: Public Test Set Evaluation
During this initial phase, methods are evaluated against the public test set. This phase is primarily for validation purposes, allowing participants to refine their approaches.
Phase 2: Private Test Set Evaluation
For the private leaderboard, we will evaluate models on new, unreleased movies to prevent data contamination. This private test set will be released on October 3rd, 2025. Final rankings will be determined based on performance in this phase.

Important Dates

01 Jul, 2025: The competition server launches with data from the public test set
03 Oct, 2025: Submission deadline for leaderboard ranking on the public test set
03-10 Oct, 2025: Private test set evaluation
10 Oct, 2025: Final rankings announced on the private test set
19 Oct, 2025: Workshop at ICCV 2025 with winner's presentations

Organizers

Junyu Xie

University of Oxford
Ridouane Ghermi

École Polytechnique, IP Paris
Tengda Han

Google DeepMind
Max Bain

Google DeepMind
Arsha Nagrani

Google DeepMind
Gül Varol

École des Ponts ParisTech
Weidi Xie

Shanghai Jiao Tong University
Vicky Kalogeiton

École Polytechnique, IP Paris
Ivan Laptev

MBZUAI
Andrew Zisserman

University of Oxford