Overview
SLoMO brings together researchers working on the story-level understanding of long-form, edited videos—particularly movies and TV episodes. Building on the success of the first edition at ICCV 2025, this 2nd edition continues our invited talks + competition format with in-person attendance, and broadens the discussion to long-form edited media understanding and how deeper movie comprehension can benefit generative models.
Compared to conventional video tasks, this setting demands: (i) modeling long-range narrative dependencies; (ii) reasoning over complex character relationships; and (iii) understanding editing patterns and cinematography.
-
Invited Talks showcase recent advances in Audio Description (AD), movie understanding, and accessibility for visually impaired audiences.
We address the following key open questions:
- How can we ensure fair evaluation of large vision-language models, particularly with respect to knowledge leakage from movie data?
- How can movie understanding advance accessibility in edited media?
- How can modeling movie structure and cinematography benefit both story-level understanding and movie generation?
-
SLoMO Competition evaluates narrative-level reasoning through two complementary tracks: Movie Question Answering (MovieQA) over full story arcs, and Audio Description (AD) Generation producing coherent, story-aware narrations to enhance accessibility for visually impaired audiences.
MO2nd












