SLoMO - ECCV 2026 Workshop

Overview

SLoMO brings together researchers working on the story-level understanding of long-form, edited videos—particularly movies and TV episodes. Building on the success of the first edition at ICCV 2025, this 2nd edition continues our invited talks + competition format with in-person attendance, and broadens the discussion to long-form edited media understanding and how deeper movie comprehension can benefit generative models.

Compared to conventional video tasks, this setting demands: (i) modeling long-range narrative dependencies; (ii) reasoning over complex character relationships; and (iii) understanding editing patterns and cinematography.

Invited Talks showcase recent advances in Audio Description (AD), movie understanding, and accessibility for visually impaired audiences.
We address the following key open questions:
- How can we ensure fair evaluation of large vision-language models, particularly with respect to knowledge leakage from movie data?
- How can movie understanding advance accessibility in edited media?
- How can modeling movie structure and cinematography benefit both story-level understanding and movie generation?
SLoMO Competition evaluates narrative-level reasoning through two complementary tracks: Movie Question Answering (MovieQA) over full story arcs, and Audio Description (AD) Generation producing coherent, story-aware narrations to enhance accessibility for visually impaired audiences.

Schedule

The workshop is a half-day session held at ECCV 2026 in Malmö, Sweden, on the afternoon of September 8, 2026. Each invited talk is 25 min plus 5 min discussion. The schedule below is tentative and subject to change.

01:00 – 01:10 pm 10 min

Opening Opening Remarks
01:10 – 01:40 pm 30 min

Invited Talk 1 Prof. Bernard Ghanem KAUST
01:40 – 02:10 pm 30 min

Invited Talk 2 Prof. Anna Rohrbach TU Darmstadt · hessian.AI
02:10 – 03:10 pm 60 min

Competition SLoMO Competition Result announcements & winners' presentations
03:10 – 03:30 pm 20 min

Break Coffee Break
03:30 – 04:00 pm 30 min

Invited Talk 3 Dr. Fabian Caba Heilbron Adobe Research
04:00 – 04:30 pm 30 min

Invited Talk 4 Dr. Piotr Mirowski Google DeepMind
04:30 – 05:00 pm 30 min

Closing Discussion & Closing Remarks

Invited Speakers

Bernard Ghanem

KAUST
Anna Rohrbach

TU Darmstadt · hessian.AI
Fabian Caba Heilbron

Adobe Research
Piotr Mirowski

Google DeepMind

SLoMO Competition

The SLoMO Competition advances story-level video understanding through two complementary tasks: Movie Question Answering (MovieQA) and Audio Description (AD) Generation. The competition is designed to evaluate long-form, narrative-level reasoning in video-language models, going beyond clip-level perception toward coherent story understanding.

Datasets

Short-Films 20K (SF20K) — a large-scale, publicly available collection of 20,143 self-contained short films totalling 3,684 h. Each film lasts 5–40 min (~11 min on average) across diverse genres, with both automatically generated and manually curated QA pairs.
Condensed Movie Dataset (CMD-AD) — short clips from over 1,432 online movies with professionally annotated Audio Descriptions from AudioVault, temporally aligned with the clips.

Tracks

SLoMO QA Track: Movie Question Answering (MovieQA)
Dataset: SF20K (19,071 train / 50 public test / 45 private test movies).
Metric: LLM-QA-Eval — gpt-4.1-nano compares ground-truth and predicted answers and assigns a binary correctness label; the final score is the percentage of correct answers.
SLoMO AD Track: Audio Description (AD) Generation
Datasets: CMD-AD (1,332 train / 98 public test / 100 private test movies) + SF20K (zero-shot, 17 public test / 45 private test movies).
Metric: a single AD score -- a weighted combination of
ADQA (a question-answering measure of how well the ADs convey the story's visual content, leveraging gemini-3.0-flash)
CIDEr (n-gram agreement with reference ADs). The exact weights are not disclosed
An AD Duration Check (conformance to the expected length of audio descriptions) is performed for each AD as a qualifying check, and does not contribute to the overall score.

Important Dates

29 Jun, 2026: Competition server launches with the public test set
1 Aug, 2026: Competition server launches with the private test set
25 Aug, 2026: Submission deadline for leaderboard ranking
1 Sep, 2026: Final rankings announced
8 Sep, 2026: Workshop at ECCV 2026 with winners' presentations

Organizers

Junyu Xie

University of Oxford
Ridouane Ghermi

École Polytechnique, IP Paris
Divy Kala

IIIT Hyderabad
Tengda Han

Google DeepMind
Max Bain

Google DeepMind
Arsha Nagrani

Google DeepMind
Xi Wang

École Polytechnique, IP Paris
Vicky Kalogeiton

École Polytechnique, IP Paris
Gül Varol

École des Ponts ParisTech
Weidi Xie

Shanghai Jiao Tong University
Makarand Tapaswi

IIIT Hyderabad
Ivan Laptev

MBZUAI
Andrew Zisserman

University of Oxford