SelfMOTR

SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors

1 Chair of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany 2 Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences,
Machine Learning for Medical Data, Düsseldorf, Germany

Abstract

End-to-end transformer architectures have driven significant progress in multi-object tracking by unifying detection and association into a single, heuristic-free framework. Despite these benefits, poor detection performance and the inherent conflict between detection and association in a joint architecture remain critical concerns. Recent approaches aim to mitigate these issues by employing advanced denoising or label assignment strategies, or by incorporating detection priors from external object detectors. In this paper, we propose SelfMOTR, a simple yet highly effective detector-free alternative that decouples proposal discovery from association using self-generated internal detection priors. Through extensive analysis and ablation studies, we show that end-to-end transformer trackers with joint detection–association decoding retain substantial hidden detection capacity, and we provide a practical detector-free mechanism for leveraging it. To shed light on these joint decoding dynamics, we draw inspiration from attention sink analyses in large language models, leveraging Track Attention Mass to show that standard generic queries exhibit unbalanced attention, frequently struggling to weigh track context against novel object discovery. SelfMOTR achieves highly competitive performance in complex, dynamic environments, yielding 69.2 HOTA on DanceTrack and leading with 71.1 HOTA on the Bird Flock Tracking (BFT) dataset.

Results on DanceTrack

Model HOTA ↑ DetA ↑ AssA ↑ IDF1 ↑ MOTA ↑
Non End-to-End
ByteTrack 47.4 71.0 32.1 53.9 89.6
MOTRv2 69.9 83.0 59.0 71.7 91.9
End-to-End
MOTR 54.2 73.5 40.2 51.5 79.7
CO-MOT 69.4 82.1 58.9 71.9 91.2
SelfMOTR (Ours) 69.2 80.9 59.3 72.5 89.9

Results on BFT and AnimalTrack

Model BFT AnimalTrack
HOTA ↑ IDF1 ↑ MOTA ↑ HOTA ↑ IDF1 ↑ MOTA ↑
Non End-to-End
SORT 61.2 77.2 75.5 42.8 49.2 55.6
ByteTrack 62.5 82.3 77.2 40.1 51.2 38.5
OC-SORT 66.8 79.3 77.1 - - -
QDTrack - - - 47.0 56.3 55.7
End-to-End
TransTrack 62.1 71.4 71.4 45.4 53.4 48.3
TrackFormer 63.3 72.4 74.1 31.0 36.5 20.4
SambaMOTR 69.6 81.9 72.0 - - -
SelfMOTR (Ours) 71.1 82.7 77.6 45.5 53.7 49.5

Video Demos

Citation

@article{gulhan2025selfmotr,
  title={SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors},
  author={G{\"u}lhan, Fabian and Mededovic, Emil and Wu, Yuli and Stegmaier, Johannes},
  journal={arXiv preprint arXiv:2511.20279},
  year={2025}
}