Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation

Equal Contribution
Corresponding Author

Abstract

We propose a weakly supervised semantic segmentation method for point clouds that predicts "per-point" labels from just "whole-scene" annotations while achieving the performance of recent fully supervised approaches. Our core idea is to propagate the scene-level labels to each point in the point cloud by creating pseudo labels in a conservative way. Specifically, we over-segment point cloud features via unsupervised clustering and associate scene-level labels with clusters through bipartite matching, thus propagating scene labels only to the most relevant clusters, leaving the rest to be guided solely via unsupervised clustering. We empirically demonstrate that over-segmentation and bipartite assignment plays a crucial role. We evaluate our method on ScanNet and S3DIS datasets, outperforming state of the art, and demonstrate that we can achieve results comparable to fully supervised methods.


Generated point clouds

We present interactive visualizations of point clouds used in the paper. Use mouse to rotate the point cloud: scroll to zoom in/zoom-out, Shift+Left Mouse Button to move the camera, Left Mouse Button to rotate the camera.

Qualitative Results (Figure 3)

Please notice that our method predicts more consistent results, both visually and spatially than CAM [1]. Our bootstrapping method further improves the perfroamnce of our method.

CAM [1]
Ours
Ours$^*$
Ground Truth

Primitive Matching (Figure 4)

We show that our proposed bipartite matching performs better than naïve matching. Black colors denote unmatched primitives or ignored points.

Naïve matching
Our bipartite matching
Ground Truth
Primitives
Matched Semantics
Primitives
Matched Semantics

Bibliography

  • [1] Wei J, Lin G, Yap KH, Hung TY, Xie L. Multi-Path Region Mining For Weakly Supervised 3D Semantic Segmentation on Point Clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020 (pp. 4384-4393).

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 42201481, the Scientific Research Foundation of Hunan Education Department under Grant 21B0332, in part by the Science and Technology Plan Project Fund of Hunan Province under Grant 2023JJ40024. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the A100 GPU used for this research. Lastly, we would like to thank Yuhe Jin for the insightful discussions.