SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors

Fabian Gülhan ¹ Emil Mededovic¹ Yuli Wu^1,2 Johannes Stegmaier^1,2

¹ Chair of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany ² Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences,
Machine Learning for Medical Data, Düsseldorf, Germany

Paper Code (Coming Soon)

Abstract

End-to-end transformer architectures have driven significant progress in multi-object tracking by unifying detection and association into a single, heuristic-free framework. Despite these benefits, poor detection performance and the inherent conflict between detection and association in a joint architecture remain critical concerns. Recent approaches aim to mitigate these issues by employing advanced denoising or label assignment strategies, or by incorporating detection priors from external object detectors. In this paper, we propose SelfMOTR, a simple yet highly effective detector-free alternative that decouples proposal discovery from association using self-generated internal detection priors. Through extensive analysis and ablation studies, we show that end-to-end transformer trackers with joint detection–association decoding retain substantial hidden detection capacity, and we provide a practical detector-free mechanism for leveraging it. To shed light on these joint decoding dynamics, we draw inspiration from attention sink analyses in large language models, leveraging Track Attention Mass to show that standard generic queries exhibit unbalanced attention, frequently struggling to weigh track context against novel object discovery. SelfMOTR achieves highly competitive performance in complex, dynamic environments, yielding 69.2 HOTA on DanceTrack and leading with 71.1 HOTA on the Bird Flock Tracking (BFT) dataset.

Results on DanceTrack

Model	HOTA ↑	DetA ↑	AssA ↑	IDF1 ↑	MOTA ↑
Non End-to-End
ByteTrack	47.4	71.0	32.1	53.9	89.6
MOTRv2	69.9	83.0	59.0	71.7	91.9
End-to-End
MOTR	54.2	73.5	40.2	51.5	79.7
CO-MOT	69.4	82.1	58.9	71.9	91.2
SelfMOTR (Ours)	69.2	80.9	59.3	72.5	89.9

Results on BFT and AnimalTrack

Model	BFT			AnimalTrack
Model	HOTA ↑	IDF1 ↑	MOTA ↑	HOTA ↑	IDF1 ↑	MOTA ↑
Non End-to-End
SORT	61.2	77.2	75.5	42.8	49.2	55.6
ByteTrack	62.5	82.3	77.2	40.1	51.2	38.5
OC-SORT	66.8	79.3	77.1	-	-	-
QDTrack	-	-	-	47.0	56.3	55.7
End-to-End
TransTrack	62.1	71.4	71.4	45.4	53.4	48.3
TrackFormer	63.3	72.4	74.1	31.0	36.5	20.4
SambaMOTR	69.6	81.9	72.0	-	-	-
SelfMOTR (Ours)	71.1	82.7	77.6	45.5	53.7	49.5

Video Demos

Citation

@article{gulhan2025selfmotr,
  title={SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors},
  author={G{\"u}lhan, Fabian and Mededovic, Emil and Wu, Yuli and Stegmaier, Johannes},
  journal={arXiv preprint arXiv:2511.20279},
  year={2025}
}