ADAS perception: 120ms → 38ms
Context: Embedded ADAS, single-camera pipeline, Orin NX target, hard 50ms budget.
Outcome
- Latency: 120ms → 38ms (p95)
- Throughput: +2.3×
- Accuracy: ΔmAP ≤ 0.2
What I did
- Profiled pre/post-processing with NVTX; fused ops
- INT8 quantization with per-tensor calibration
- TensorRT engine building, layer fusing
- Pinned memory + batch-size tuning
- On-device eval harness and benchmark report
Stack
PyTorch → ONNX → TensorRT, Jetson Orin NX, Python/C++, Triton Inference Server