Workshop accepted! — Monday, 13 July 2026 • Morning Session • RSS 2026

It’s the demos: A Deep Look at the Role of Demonstration Quality in Imitation-Based Robot Manipulation @ RSS 2026

Imitation-based robot manipulation has entered a phase where short to mid horizon tasks that can be unambiguously teleoperated have a high likelihood of success. Further Progress is increasingly driven by large-scale demonstration collection in real-world settings rather than purely algorithmic novelty. Modern pipelines, ranging from behavior cloning to diffusion- and transformer-based policies, often rely on thousands to millions of frames of demonstrations gathered via teleoperation from expert users. Within this regime, demonstration quality is a first-class variable: differences in operator skill and style (“robot wizards”), interface ergonomics, latency, action representations, and curation choices can decisively shape downstream policy performance, robustness, and generalization. Despite this, the community lacks a shared vocabulary, metrics, and engineering playbook for what “high-quality” demonstrations mean across tasks and embodiments, and how to reliably produce them at scale. The workshop targets researchers and practitioners who build, use, or maintain demonstration datasets for manipulation learning. The intended audience includes: (i) method developers working on learning-from-demonstration for dexterous manipulation, mobile manipulation, and bimanual tasks; (ii) dataset builders and infrastructure engineers responsible for logging, synchronization, calibration, and labeling; and (iii) practitioners deploying imitation-based manipulation in industrial or field environments where reliability, repeatability are central. We expect strong relevance for groups collecting demonstrations “in the wild,” where demonstrations are produced by multiple operators with varying expertise, under time pressure, and with imperfect sensing, occlusions, or changing scene conditions. Presenters and panelists will be drawn from multiple sub-communities that rarely meet in a single forum: robot manipulation learning (BC, diffusion/transformer policies, hybrid IL/RL), human-in-the-loop systems (teleoperation, XR interfaces, shared autonomy and interventions), dataset and benchmark design (metadata standards, evaluation protocols, reproducibility), and industry robotics (quality assurance, safety practices, scalable operations, and documentation). The workshop will explicitly connect “front-end” data collection decisions to “back-end” learning and evaluation outcomes, encouraging speakers to share concrete lessons from building and stress-testing real pipelines. The workshop’s relevance and impact is to complement RSS main-track research by focusing on best practices and practical failure modes that are often omitted from papers but strongly determine success in practice. Key themes include: Best-practices for real world data collection, how to train operators; how to characterize and manage operator-to-operator variation; how to design collection protocols that reduce reliance on rare “wizard” skill; how to define and measure quality beyond task success (e.g., coverage, consistency, uncertainty, recoveries, and intent clarity); how to implement quality-aware filtering, weighting, and dataset documentation; and how to report dataset properties in ways that make results comparable across labs. The expected outcome is a community-curated set of actionable recommendations, protocols, checklists, as well as open problems. The final goal is to improve reproducibility, reduce wasted data collection, and accelerate reliable imitation-based manipulation progress in 2026 and beyond.

Content

Topics

We will focus discussions around concrete, practice-driven questions that connect demonstration collection decisions to downstream imitation-learning performance:

  • What is “demonstration quality” beyond task success? Which dimensions matter most in manipulation (e.g., coverage, consistency, smoothness, recoveries, intent clarity, viewpoint/occlusion)?
  • Operator variation and the “robot wizard” effect. How large is inter-operator variability in real datasets? What behaviors distinguish consistently high-performing operators? When do “wizard” demos help vs. hurt generalization?
  • Teleoperation/XR interface factors that change the data distribution. How do latency, control mappings, action space choices, assistive autonomy, camera viewpoints, haptics, and operator feedback cues affect demo quality and policy outcomes?
  • Quality control at scale. What lightweight checks are effective in large-scale pipelines (automatic heuristics, consistency checks, anomaly detection, review queues? What should be measured routinely?
  • Curation strategies: keep or discard, filter, weight? When is it better to discard poor demos vs. keep them with weights? How do we avoid biasing the dataset by over-filtering? What metadata is essential?
  • Dataset documentation and reporting standards. What minimal metadata should accompany demonstrations to enable reproducibility?
  • Evaluation: linking demo properties to policy performance. Which benchmarks and protocols best reveal demo-quality effects?
  • From anecdotes to hypotheses (and experiments). What recurring practical observations can be turned into testable hypotheses?
  • Workshop format

    The workshop will include:

    • Invited talks: 25 min presentation + 5 min Q&A.
    • Accepted extended abstracts and Reflections (3 pages with unlimited references and appendix) presented in poster sessions and selected spotlight talks. In case of a hybrid or virtual workshop, we will ask for pre-recorded spotlight talks for a smoother execution in case of connection issues. However, for each selected contribution, at least one author will be required to be present during the workshop for a live Q&A session;
    • Structured small-group discussion to extract shared observations. We will run a 35-minute moderated breakout where participants are randomly assigned to mixed groups of 5–8 (balancing seniority, institutions, and backgrounds). Each group will use a short worksheet to: (i) list practical “rules of thumb” they believe, (ii) identify overlaps across members’ experiences, (iii) convert overlaps into candidate hypotheses (e.g., which interface factors correlate with higher-quality demos; which curation steps reduce failure), and (iv) propose minimal experiments or metrics to test them. Groups will report back key hypotheses to the room, and we will compile them into a shared post-workshop document.

Award

INCAR Robotics AB will sponsor an audience-voted award for the Most Useful Practical Information, intended to reward clarity, honesty about failure modes, and actionable guidance rather than polished results. The award consists of a Meta Quest 3S + peripherals and a one-year license for the INCAR teleoperation/learning software stack. Voting will be open to all attendees to amplify community preferences and encourage concise, transferable best practices.

Schedule (Tentative)

Half-day morning workshop — Monday, 13 July 2026 — Sydney, Australia (AEST, UTC+10)


Time (AEST) Activity
08:00 – 08:10 Workshop Opening
08:10 – 08:40 Invited Talk #1 (25 min talk + 5 min Q&A)
08:40 – 09:10 Lightning Talks #1 (3 × (7 min talk + 3 min Q&A))
09:10 – 09:40 Invited Talk #2 (25 min talk + 5 min Q&A)
09:40 – 10:30 Coffee Break + Poster Presentation
10:30 – 11:00 Invited Talk #3 (25 min talk + 5 min Q&A)
11:00 – 11:20 Lightning Talks #2 (2 × (7 min talk + 3 min Q&A))
11:20 – 11:55 Structured Group Discussion and Presentation
11:55 – 12:00 Closing & Award Ceremony

Call for Papers

We invite participants to submit short contributions (3 pages extended abstract, unlimited references and appendix) focused on practical lessons from demonstration collection and curation: what failed, what worked, what metrics or checks were useful, and what factors mattered.

Submissions will be accepted primarily for poster presentation, with a curated subset invited for lightning talks (7 min + 3 min Q&A) emphasizing actionable takeaways and an “anecdote → hypothesis” framing. We will reserve a substantial fraction of lightning slots for students, postdocs, and first-time RSS participants.

Contributions are encouraged, but not required, to be original. The review process will be single-blind (submitted papers do not need to be anonymized). Accepted abstracts will be made available on the workshop website but will not appear in the official RSS proceedings.

Important Dates

Submission portal and exact dates will be announced here. Check back soon.

Invited Speakers

Nadia Figueroa


Assistant Professor
University of Pennsylvania, USA
Personal website

Dongheui Lee


Full Professor
Technische Universität Wien (TU Wien), Austria
Personal website

Dana Kulic


Professor
Monash University, Australia
Personal website


Organizers

  • Michael C. Welle - INCAR Robotics AB, Sweden
  • Jonne van Haastregt - INCAR Robotics AB, Sweden
  • Durgesh Haribhau Salunkhe - École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
  • Andrej Gams - Jozef Stefan Institute (JSI), Slovenia
  • Sthithpragya Gupta - École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
  • João Silvério - German Aerospace Center (DLR), Germany
  • Niko Suenderhauf - Queensland University of Technology (QUT), Australia
  • Aude Billard - École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
  • Danica Kragic - KTH Royal Institute of Technology, Sweden

Contact

If you have any questions please contact Michael Welle at the email: MichaelDOTWelleATincar-roboticsDOTse

Acknowledgment

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.