CV

Education

Institute of Automation, Chinese Academy of Sciences (CASIA) - Ph.D. in Pattern Recognition and Intelligent Systems, Sep 2022 - Jun 2027 (expected)
Key Lab of Multimodal Artificial Intelligence Systems (National Key Lab)

Tianjin University - B.Eng. in Computer Science and Technology, Sep 2018 - Jun 2022
School of Intelligent and Computing (New Engineering Class); Rank: 3/59; Weighted GPA: 90.2/100


Research Highlights

AutoPrune: Each Complexity Deserves a Pruning Policy

NeurIPS 2025 - First Author

Authors: Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Yufan Liu, Weiming Hu, Ke Wang, Zhipeng Zhang

  • Motivation: In vision-language models (VLMs) and end-to-end driving, long visual sequences cause memory and latency bottlenecks. Training-free pruning often adopts fixed keep ratios without global compute control, limiting performance on reasoning-heavy tasks.
  • Method: AutoPrune estimates mutual information between early visual and textual tokens and maps it to a budget-constrained logistic retention curve, producing per-sample, per-task, per-layer adaptive keep ratios under any target token/FLOPs budget.
  • Results: On LLaVA 1.5 7B and other VLM/VLA models, AutoPrune removes up to 89% visual tokens and cuts FLOPs by 76.8% while retaining ~96.7% average accuracy, surpassing PDrop by 9.1%, with consistent gains on standard VLM benchmarks and autonomous driving.

Online Segment Any 3D Thing as Instance Tracking

NeurIPS 2025 - First Author

Authors: Hanshi Wang, caizijian, Jin Gao, Yiwei Zhang, Weiming Hu, Ke Wang, Zhipeng Zhang

  • Motivation: Online 3D instance segmentation with VFMs often yields fragmented masks, over-segmentation, and ID drift due to missing temporal modeling.
  • Method: Recast as tracking with three lightweight modules: LTM (bounded track bank with confidence-gated Hungarian assignment), STM (distance-aware cross-frame attention for short-term context), and SCL (merges high-affinity fragments with joint 2D/3D reasoning and one-to-many supervision).
  • Results: +2.8 AP over ESAM on ScanNet200 with real-time throughput; consistent improvements on ScanNet, SceneNN, and 3RScan.

MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection

ICCV 2025 Highlight - First Author

Authors: Hanshi Wang, Jin Gao, Weiming Hu, Zhipeng Zhang

  • Motivation: Camera+LiDAR 3D detection struggles to simultaneously achieve efficiency, long-range modeling, and full-scene retention.
  • Method: A set-based fusion with linear attention (Mamba). Height-Fidelity LiDAR encoding and a Hybrid Mamba Block preserve height cues, align modalities, and model both local and global context in linear time.
  • Results: 75.0 NDS on nuScenes val; achieves SOTA with faster inference.

A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection

CVPR 2024 - First Author

Authors: Hanshi Wang, Zhipeng Zhang, Jin Gao, Weiming Hu

  • Motivation: Teacher/student architectural and input-format symmetry weakens distillation and under-utilizes temporal cues.
  • Method: First online asymmetric semi-supervised 3D detection framework with attention-based refinement; leverages past/future cues in a divide-and-conquer strategy to correct poor detections, misses, and false positives.
  • Results: On Waymo, improves mAP (L1) by 4.7 over prior SOTA with fewer training resources.

The Devil is in the Quality: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

ICRA 2025 - Co-First Author

Authors: Zhipeng Zhang*, Zhenyu Li*, Hanshi Wang*, Yuan He, Ke Wang, Heng Fan

  • Motivation: Semi-supervised monocular 3D detection suffers from noisy pseudo labels and low learning efficiency.
  • Method: Augment-Criticize learns transformations and aggregates predictions to mine reliable pseudo labels; Critical Retraining Strategy (CRS) dynamically evaluates pseudo-label contributions to suppress noise.
  • Results: Significant gains when applied to MonoDLE and MonoFlex; strong generality.

HDGS: Hierarchical Dynamic Gaussian for Urban Driving Scenes

AAAI 2026 Oral

Authors: Fudong Ge, Jin Gao, Hanshi Wang, Yiwei Zhang, Ke Wang, Weiming Hu, Zhipeng Zhang

  • Motivation: Dynamic-scene 3D Gaussian Splatting faces a fidelity-storage trade-off in urban driving.
  • Method: Hierarchical Dynamic Gaussian Splatting (HDGS) with multi-layer anchors; enforces global-local and depth-consistency constraints for efficient high-fidelity compression.
  • Results: Reduces storage by an average of 62% while delivering superior rendering quality.

Integrating Diverse Assignment Strategies into DETRs

AAAI 2026

Authors: Yiwei Zhang, Jin Gao, Hanshi Wang, Fudong Ge, Guan Luo, Weiming Hu, Zhipeng Zhang

  • Motivation: One-to-one matching in DETR leads to slow convergence and sensitivity to assignment choices.
  • Method: LoRA-DETR with stage-wise assignment via LoRA; each stage uses a different, complementary strategy and supervises later stages to stabilize training and enrich features.
  • Results: Consistent improvements across datasets and metrics; faster convergence and stronger accuracy.

Competitions & Awards

RoboSense Challenge 2025 (IROS) - Track #1: Driving with Language - 3rd Place
We won third place in The RoboSense Challenge 2025 (IROS) Track #1: Driving with Language. We design a unified surround view framework that integrates perception, prediction and planning with temporal history, combine training free mapping with chain of thought reasoning in a general VLM, fine tune Qwen-VL-72B on DriveLM with distilled reasoning traces, and achieve a weighted score of 64.29 on the official test set, which is 20.59 points higher than the baseline and robust under noisy conditions.

Additional Honors

  • Outstanding Student - University of Chinese Academy of Sciences (UCAS) (2025)
  • Excellent Student Cadre Scholarship - Tianjin University (2020)
  • First-Class Merit Student Scholarship - Tianjin University (2019)
  • Third Prize - National College Student Mathematics Competition (2019)
  • Second Prize - Tianjin College Student Mathematics Competition (2019)