SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors
Machine Learning for Medical Data, Düsseldorf, Germany
Abstract
End-to-end transformer architectures have driven significant progress in multi-object tracking by unifying detection and association into a single, heuristic-free framework. Despite these benefits, poor detection performance and the inherent conflict between detection and association in a joint architecture remain critical concerns. Recent approaches aim to mitigate these issues by employing advanced denoising or label assignment strategies, or by incorporating detection priors from external object detectors. In this paper, we propose SelfMOTR, a simple yet highly effective detector-free alternative that decouples proposal discovery from association using self-generated internal detection priors. Through extensive analysis and ablation studies, we show that end-to-end transformer trackers with joint detection–association decoding retain substantial hidden detection capacity, and we provide a practical detector-free mechanism for leveraging it. To shed light on these joint decoding dynamics, we draw inspiration from attention sink analyses in large language models, leveraging Track Attention Mass to show that standard generic queries exhibit unbalanced attention, frequently struggling to weigh track context against novel object discovery. SelfMOTR achieves highly competitive performance in complex, dynamic environments, yielding 69.2 HOTA on DanceTrack and leading with 71.1 HOTA on the Bird Flock Tracking (BFT) dataset.
Results on DanceTrack
| Model | HOTA ↑ | DetA ↑ | AssA ↑ | IDF1 ↑ | MOTA ↑ |
|---|---|---|---|---|---|
| Non End-to-End | |||||
| ByteTrack | 47.4 | 71.0 | 32.1 | 53.9 | 89.6 |
| MOTRv2 | 69.9 | 83.0 | 59.0 | 71.7 | 91.9 |
| End-to-End | |||||
| MOTR | 54.2 | 73.5 | 40.2 | 51.5 | 79.7 |
| CO-MOT | 69.4 | 82.1 | 58.9 | 71.9 | 91.2 |
| SelfMOTR (Ours) | 69.2 | 80.9 | 59.3 | 72.5 | 89.9 |
Results on BFT and AnimalTrack
| Model | BFT | AnimalTrack | ||||
|---|---|---|---|---|---|---|
| HOTA ↑ | IDF1 ↑ | MOTA ↑ | HOTA ↑ | IDF1 ↑ | MOTA ↑ | |
| Non End-to-End | ||||||
| SORT | 61.2 | 77.2 | 75.5 | 42.8 | 49.2 | 55.6 |
| ByteTrack | 62.5 | 82.3 | 77.2 | 40.1 | 51.2 | 38.5 |
| OC-SORT | 66.8 | 79.3 | 77.1 | - | - | - |
| QDTrack | - | - | - | 47.0 | 56.3 | 55.7 |
| End-to-End | ||||||
| TransTrack | 62.1 | 71.4 | 71.4 | 45.4 | 53.4 | 48.3 |
| TrackFormer | 63.3 | 72.4 | 74.1 | 31.0 | 36.5 | 20.4 |
| SambaMOTR | 69.6 | 81.9 | 72.0 | - | - | - |
| SelfMOTR (Ours) | 71.1 | 82.7 | 77.6 | 45.5 | 53.7 | 49.5 |
Video Demos
Citation
@article{gulhan2025selfmotr,
title={SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors},
author={G{\"u}lhan, Fabian and Mededovic, Emil and Wu, Yuli and Stegmaier, Johannes},
journal={arXiv preprint arXiv:2511.20279},
year={2025}
}