Towards Bezier Deformable Attention for Road Topology Understanding
Understanding road topology is crucial for autonomous driving. This paper introduces TopoBDA (Topology with Bezier Deformable Attention), a novel approach that enhances road topology comprehension by leveraging Bezier Deformable Attention (BDA). TopoBDA processes multi-camera 360-degree imagery to generate Bird's Eye View (BEV) features, which are refined through a transformer decoder employing BDA. BDA utilizes Bezier control points to drive the deformable attention mechanism, improving the detection and representation of elongated and thin polyline structures, such as lane centerlines.
Additionally, TopoBDA integrates two auxiliary components: an instance mask formulation loss and a one-to-many set prediction loss strategy, to further refine centerline detection and enhance road topology understanding. Experimental evaluations on the OpenLane-V2 dataset demonstrate that TopoBDA outperforms existing methods, achieving state-of-the-art results in centerline detection and topology reasoning. TopoBDA also achieves the best results on the OpenLane-V1 dataset in 3D lane detection. Further experiments on integrating multi-modal dataβsuch as LiDAR, radar, and SDMapβshow that multimodal inputs can further enhance performance in road topology understanding.
For LiDAR integration experiments with OpenLane-V2, we have created a specialized fork with LiDAR point cloud data integration and visualization capabilities.
First-time integration of Multi-Point Deformable Attention (MPDA) into Bezier keypoint-dependent transformer decoders, enhancing centerline detection for elongated polyline structures.
Novel attention mechanism utilizing Bezier control points as reference points, achieving superior performance with reduced computational complexity compared to traditional MPDA approaches.
Indirect auxiliary supervision through instance mask prediction and Mask-L1 mix matcher, improving centerline detection without inference overhead.
First comprehensive evaluation of camera, LiDAR, radar, and SDMap integration for road topology understanding, achieving state-of-the-art multi-modal performance.
Systematic comparative analysis of attention mechanisms for road topology understanding, evaluating computational complexity, runtime efficiency, and performance across Standard, Masked, Single-Point, Multi-Point, and Bezier Deformable Attention with detailed FLOPS and parameter analysis.
Multi-camera 360-degree imagery processed through BEV feature extraction, refined via transformer decoder with sparse query approach where each query represents a centerline instance rather than individual points.
Compact polyline representation through Bezier control points, enabling efficient centerline modeling via matrix multiplication operations while reducing computational complexity at regression heads.
Instance mask prediction and Mask-L1 mix matcher during training enhance centerline detection accuracy. One-to-many set prediction loss improves training convergence without inference overhead.
| Method | Sensor | DETl | DETt | TOPll | TOPlt | OLS |
|---|---|---|---|---|---|---|
| TopoNet | C | 28.6 | 48.6 | 10.9 | 23.9 | 39.8 |
| TopoMLP | C | 28.5 | 49.5 | 21.7 | 26.9 | 44.1 |
| TopoFormer | C | 34.7 | 48.2 | 24.1 | 29.5 | 46.3 |
| TopoMaskV2 | C | 34.5 | 53.8 | 24.5 | 35.6 | 49.4 |
| TopoBDA (Ours) | C | 38.9 | 54.3 | 27.6 | 37.3 | 51.7 |
| TopoBDA (Ours) | C + L | 47.3 | 54.0 | 35.5 | 41.9 | 56.4 |
| TopoBDA (Ours) | C + SD | 42.7 | 52.4 | 34.3 | 41.7 | 54.6 |
| TopoBDA (Ours) | C + L + SD | 52.0 | 52.4 | 38.5 | 45.3 | 58.4 |
| Distance | Method | Backbone | F1-Score β | X-error near (m) β | X-error far (m) β | Z-error near (m) β | Z-error far (m) β |
|---|---|---|---|---|---|---|---|
| 1.5m | PersFormer | ResNet-50 | 52.7 | 0.307 | 0.319 | 0.083 | 0.117 |
| Anchor3DLane | ResNet-50 | 57.5 | 0.229 | 0.243 | 0.079 | 0.106 | |
| GroupLane | ResNet-50 | 60.2 | 0.371 | 0.476 | 0.220 | 0.357 | |
| LaneCPP | EffNet-B7 | 60.3 | 0.264 | 0.310 | 0.077 | 0.117 | |
| LATR | ResNet-50 | 61.9 | 0.219 | 0.259 | 0.075 | 0.104 | |
| PVALane | ResNet-50 | 62.7 | 0.232 | 0.259 | 0.092 | 0.118 | |
| TopoBDA (Ours) | ResNet-50 | 63.9 | 0.224 | 0.243 | 0.069 | 0.101 | |
| 0.5m | PersFormer | ResNet-50 | 43.2 | 0.229 | 0.245 | 0.078 | 0.106 |
| DV-3DLane | ResNet-34 | 52.9 | 0.173 | 0.212 | 0.069 | 0.098 | |
| LATR | ResNet-50 | 54.0 | 0.171 | 0.201 | 0.072 | 0.099 | |
| TopoBDA (Ours) | ResNet-50 | 57.9 | 0.157 | 0.179 | 0.067 | 0.087 |
| IMAL | ML1M | DETl | DETl_ch | TOPll | OLSl |
|---|---|---|---|---|---|
| β | β | 37.0 | 39.8 | 29.0 | 43.6 |
| β | β | 40.7 | 42.1 | 32.4 | 46.6 |
| β | β | 40.8 | 45.8 | 32.9 | 48.0 |
IMAL: Instance Mask Auxiliary Loss | ML1M: Mask-L1 Mix Matcher
Showing significant improvements from instance mask formulation and mask-L1 mix matcher: +3.8 points in DETl and +4.4 points in OLSl.
| Attention Type | DETl | DETl_ch | TOPll | OLSl |
|---|---|---|---|---|
| SA (Standard Attention) | 34.5 | 38.4 | 25.1 | 41.0 |
| MA (Masked Attention) | 35.8 | 40.2 | 26.9 | 42.6 |
| SPDA (Single-Point Deformable Attention) | 38.3 | 39.8 | 29.5 | 44.1 |
| MPDA4 (Multi-Point Deformable Attention) | 40.2 | 45.0 | 32.6 | 47.4 |
| MPDA16 (16-Point Deformable Attention) | 40.3 | 45.1 | 32.7 | 47.5 |
| BDA (Bezier Deformable Attention) | 40.8 | 45.8 | 32.9 | 48.0 |
Experimental evaluations demonstrate that TopoBDA achieves state-of-the-art performance across both subsets of the OpenLane-V2 dataset. Specifically, TopoBDA surpasses existing methods with a DETl score of 38.9 and an OLS score of 51.7 in Subset-A, and a DETl score of 45.1 and an OLS score of 54.3 in Subset-B.
The integration of multi-modal data significantly boosts performance: fusing camera and LiDAR data increases the OLS score in Subset-A from 51.7 to 56.4, and in Subset-B from 54.3 to 61.7. Further incorporating SDMap alongside camera and LiDAR sensors raises the OLS score in Subset-A to 58.4. These results underscore the effectiveness of TopoBDA in road topology comprehension and highlight the substantial benefits of multi-modal fusion.
Additionally, TopoBDA achieves superior results on the OpenLane-V1 benchmark for 3D lane detection, with F1-scores of 63.9 at a 1.5m distance and 57.9 at a 0.5m distance. This work contributes toward closing existing gaps in HDMap element prediction, offering a unified framework for road topology understanding and 3D lane detection in autonomous driving.
@article{kalfaoglu2026topobda,
title={TopoBDA: Towards Bezier Deformable Attention for Road Topology Understanding},
author={Kalfaoglu, Muhammet Esat and Ozturk, Halil Ibrahim and Kilinc, Ozsel and Temizel, Alptekin},
journal={Neurocomputing},
volume={670},
pages={132360},
year={2026},
publisher={Elsevier},
doi={10.1016/j.neucom.2025.132360}
}