| TAP-Vid-DAVIS | EgoPoints | ||||
|---|---|---|---|---|---|
| Model | δavg↑ | δavg↑ | ReID δavg↑ | OOVA↑ | IVA↑ |
| PIPs++ | 64.0 | 36.9 | 14.6 | 50.4 | 89.2 |
| CoTracker | 74.7 | 38.5 | 4.8 | 81.4 | 73.4 |
| BootsTAPIR Online | 65.2 | 39.6 | 0.0 | 0.0 | 100.0 |
| LocoTrack | 75.3 | 59.4 | 0.1 | 0.2 | 99.9 |
| CoTracker v3 | 77.2 | 50.0 | 15.0 | 31.8 | 99.3 |
We introduce EgoPoints, a benchmark for point tracking in egocentric videos. We annotate 4.7K challenging tracks in egocentric sequences, including 9x more points that go out-of-view and 59x more points that require re-identification (ReID) after returning to view, compared to the popular TAP-Vid-DAVIS evaluation benchmark. To measure the performance of models on such challenging points, we introduce evaluation metrics that particularly monitor tracking on points in-view, out-of-view, and those that require re-identification.
We then propose a pipeline to create semi-real sequences, with automatic ground truth. We generate 11K sequences by combining dynamic Kubric objects with scene points from EPIC Fields. When fine-tuning state-of-the-art methods on these sequences and evaluating on our annotated EgoPoints sequences, we improve CoTracker across all metrics including the tracking accuracy δ★avg by 2.7 percentage points and accuracy on ReID sequences (ReIDδavg) by 2.4 points. We also improve δ★avg and ReIDδavg of PIPs++ by 0.3 and 2.2 respectively.
We compare the performance of several state-of-the-art point on trackers on both DAVIS (left) and EgoPoints (right). These comparisons highlight the complexity and challenges introduced by EgoPoints, as top-performing point trackers struggle or degrade in the more difficult scenarios it provides.
| TAP-Vid-DAVIS | EgoPoints | ||||
|---|---|---|---|---|---|
| Model | δavg↑ | δavg↑ | ReID δavg↑ | OOVA↑ | IVA↑ |
| PIPs++ | 64.0 | 36.9 | 14.6 | 50.4 | 89.2 |
| CoTracker | 74.7 | 38.5 | 4.8 | 81.4 | 73.4 |
| BootsTAPIR Online | 65.2 | 39.6 | 0.0 | 0.0 | 100.0 |
| LocoTrack | 75.3 | 59.4 | 0.1 | 0.2 | 99.9 |
| CoTracker v3 | 77.2 | 50.0 | 15.0 | 31.8 | 99.3 |
This work proposes a new annotations benchmark that is publicly available, and builds on publicly avail- able dataset EPIC-KITCHENS. It is supported by EPSRC Doc- toral Training Program, EPSRC UMPIRE EP/T004991/1 and EP- SRC Programme Grant VisualAI EP/T028572/1. We acknowledge the use of the EPSRC funded Tier 2 facility JADE-II.
@inproceedings{darkhalil2025egopoints,
title={EgoPoints: Advancing Point Tracking for Egocentric Videos},
author={Darkhalil, Ahmad and Guerrier, Rhodri and Harley, Adam W. and Damen, Dima},
booktitle= {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}