EgoPoints: Advancing Point Tracking for Egocentric Videos

We introduce EgoPoints, a benchmark for point tracking in egocentric videos. We annotate 4.7K challenging tracks in egocentric sequences, including 9x more points that go out-of-view and 59x more points that require re-identification (ReID) after returning to view, compared to the popular TAP-Vid-DAVIS evaluation benchmark. To measure the performance of models on such challenging points, we introduce evaluation metrics that particularly monitor tracking on points in-view, out-of-view, and those that require re-identification.

We then propose a pipeline to create semi-real sequences, with automatic ground truth. We generate 11K sequences by combining dynamic Kubric objects with scene points from EPIC Fields. When fine-tuning state-of-the-art methods on these sequences and evaluating on our annotated EgoPoints sequences, we improve CoTracker across all metrics including the tracking accuracy δ^★_avg by 2.7 percentage points and accuracy on ReID sequences (ReIDδ_avg) by 2.4 points. We also improve δ^★_avg and ReIDδ_avg of PIPs++ by 0.3 and 2.2 respectively.

EgoPoints: Visualisation | Performance | Samples • K-EPIC: Pipeline | Samples Fine-Tuned Co-Tracker: Visualisations • Resources: Download | Acknowledgements

	TAP-Vid-DAVIS	EgoPoints
Model	δ_avg↑	δ_avg↑	ReID δ_avg↑	OOVA↑	IVA↑
PIPs++	64.0	36.9	14.6	50.4	89.2
CoTracker	74.7	38.5	4.8	81.4	73.4
BootsTAPIR Online	65.2	39.6	0.0	0.0	100.0
LocoTrack	75.3	59.4	0.1	0.2	99.9
CoTracker v3	77.2	50.0	15.0	31.8	99.3

BibTeX

@inproceedings{darkhalil2025egopoints,
        title={EgoPoints: Advancing Point Tracking for Egocentric Videos}, 
        author={Darkhalil, Ahmad and Guerrier, Rhodri and Harley, Adam W. and Damen, Dima},
        booktitle= {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        year={2025}
  }

EgoPoints: Advancing Point Tracking for Egocentric Videos

Abstract

Introducing EgoPoints

PIPs++ Performance on DAVIS vs. EgoPoints

CotrackerV2 Performance on DAVIS vs. EgoPoints

CotrackerV3 Performance on DAVIS vs. EgoPoints

TAPIR Performance on DAVIS vs. EgoPoints

LocoTrack Performance on DAVIS vs. EgoPoints

Performance Comparison on EgoPoints

Performance of point tracking baselines on the TAP-Vid-DAVIS compared to our EgoPoints benchmark on the main metric δ_avg. Additionally, metrics showcasing ReID, out-of-view (OOVA) and in-view (IVA) accuracy are proposed to highlight model failures.

EgoPoints Annotations

Examples of sparsely annotated sequences from EgoPoints benchmark. The dashed lines represent dynamic object tracks, while solid lines show scene point tracks.

K-EPIC pipeline

The pipeline for K-EPIC. This includes projecting 3D points as tracks and filtering them using CoTracker to get scene points (left). Additionally, we sample 3D objects and tracks from TAP-Vid-KUBRIC (top right).

K-EPIC samples

Examples from the K-EPIC synehtic sequences. Empty points are out-of-view points.

Comparison between CotrackerV2 baseline and fine-tuned model

We qualitatively compare the baseline and fine-tuned Cotracker models on K-EPIC, evaluated on EgoPoints.

Download the data

Acknowledgments

BibTeX

EgoPoints: Advancing Point Tracking for Egocentric Videos

Abstract

Introducing EgoPoints

PIPs++ Performance on DAVIS vs. EgoPoints

CotrackerV2 Performance on DAVIS vs. EgoPoints

CotrackerV3 Performance on DAVIS vs. EgoPoints

TAPIR Performance on DAVIS vs. EgoPoints

LocoTrack Performance on DAVIS vs. EgoPoints

Performance Comparison on EgoPoints

Performance of point tracking baselines on the TAP-Vid-DAVIS compared to our EgoPoints benchmark on the main metric δavg. Additionally, metrics showcasing ReID, out-of-view (OOVA) and in-view (IVA) accuracy are proposed to highlight model failures.

EgoPoints Annotations

Examples of sparsely annotated sequences from EgoPoints benchmark. The dashed lines represent dynamic object tracks, while solid lines show scene point tracks.

K-EPIC pipeline

The pipeline for K-EPIC. This includes projecting 3D points as tracks and filtering them using CoTracker to get scene points (left). Additionally, we sample 3D objects and tracks from TAP-Vid-KUBRIC (top right).

K-EPIC samples

Examples from the K-EPIC synehtic sequences. Empty points are out-of-view points.

Comparison between CotrackerV2 baseline and fine-tuned model

We qualitatively compare the baseline and fine-tuned Cotracker models on K-EPIC, evaluated on EgoPoints. \ \ \

Download the data

Acknowledgments

BibTeX

Performance of point tracking baselines on the TAP-Vid-DAVIS compared to our EgoPoints benchmark on the main metric δ_avg. Additionally, metrics showcasing ReID, out-of-view (OOVA) and in-view (IVA) accuracy are proposed to highlight model failures.

We qualitatively compare the baseline and fine-tuned Cotracker models on K-EPIC, evaluated on EgoPoints.