We propose a weakly supervised semantic segmentation method for point clouds that predicts "per-point" labels from just "whole-scene" annotations while achieving the performance of recent fully supervised approaches. Our core idea is to propagate the scene-level labels to each point in the point cloud by creating pseudo labels in a conservative way. Specifically, we over-segment point cloud features via unsupervised clustering and associate scene-level labels with clusters through bipartite matching, thus propagating scene labels only to the most relevant clusters, leaving the rest to be guided solely via unsupervised clustering. We empirically demonstrate that over-segmentation and bipartite assignment plays a crucial role. We evaluate our method on ScanNet and S3DIS datasets, outperforming state of the art, and demonstrate that we can achieve results comparable to fully supervised methods.
We present interactive visualizations of point clouds used in the paper. Use mouse to rotate the point cloud: scroll to zoom in/zoom-out, Shift+Left Mouse Button to move the camera, Left Mouse Button to rotate the camera.
Please notice that our method predicts more consistent results, both visually and spatially than CAM [1]. Our bootstrapping method further improves the perfroamnce of our method.
We show that our proposed bipartite matching performs better than naïve matching. Black colors denote unmatched primitives or ignored points.
This work was supported in part by the National Natural Science Foundation of China under Grant 42201481, the Scientific Research Foundation of Hunan Education Department under Grant 21B0332, in part by the Science and Technology Plan Project Fund of Hunan Province under Grant 2023JJ40024. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the A100 GPU used for this research. Lastly, we would like to thank Yuhe Jin for the insightful discussions.