Content
Topics
We will focus discussions around concrete, practice-driven questions that connect demonstration collection decisions to downstream imitation-learning performance:
- What is “demonstration quality” beyond task success? Which dimensions matter most in manipulation (e.g., coverage, consistency, smoothness, recoveries, intent clarity, viewpoint/occlusion)?
- Operator variation and the “robot wizard” effect. How large is inter-operator variability in real datasets? What behaviors distinguish consistently high-performing operators? When do “wizard” demos help vs. hurt generalization?
- Teleoperation/XR interface factors that change the data distribution. How do latency, control mappings, action space choices, assistive autonomy, camera viewpoints, haptics, and operator feedback cues affect demo quality and policy outcomes?
- Quality control at scale. What lightweight checks are effective in large-scale pipelines (automatic heuristics, consistency checks, anomaly detection, review queues? What should be measured routinely?
- Curation strategies: keep or discard, filter, weight? When is it better to discard poor demos vs. keep them with weights? How do we avoid biasing the dataset by over-filtering? What metadata is essential?
- Dataset documentation and reporting standards. What minimal metadata should accompany demonstrations to enable reproducibility?
- Evaluation: linking demo properties to policy performance. Which benchmarks and protocols best reveal demo-quality effects?
- From anecdotes to hypotheses (and experiments). What recurring practical observations can be turned into testable hypotheses?
- Invited talks: 25 min presentation + 5 min Q&A.
- Accepted extended abstracts and Reflections (3 pages with unlimited references and appendix) presented in poster sessions and selected spotlight talks. In case of a hybrid or virtual workshop, we will ask for pre-recorded spotlight talks for a smoother execution in case of connection issues. However, for each selected contribution, at least one author will be required to be present during the workshop for a live Q&A session;
- Structured small-group discussion to extract shared observations. We will run a 35-minute moderated breakout where participants are randomly assigned to mixed groups of 5–8 (balancing seniority, institutions, and backgrounds). Each group will use a short worksheet to: (i) list practical “rules of thumb” they believe, (ii) identify overlaps across members’ experiences, (iii) convert overlaps into candidate hypotheses (e.g., which interface factors correlate with higher-quality demos; which curation steps reduce failure), and (iv) propose minimal experiments or metrics to test them. Groups will report back key hypotheses to the room, and we will compile them into a shared post-workshop document.
Workshop format
The workshop will include: