Depth for (and from) Convolutional Neural Networks
All state of the art image classification, recognition and segmentation models use convolutions. These (mostly) have a fixed spatial extend in the image plane, by using filters of 3×3 pixels. In this talk I will argue that convolutions should have a fixed spatial extend in the real world, in the XYZ space. We introduce a novel convolutional operator using RGB + depth as input, which yields (approximately) fixed size filters in the real world. We exploit these for image segmentation, and also show that our method is beneficial when we use D inferred from RGB, and then use our proposed RGB-D Neighbourhood Convolution. If time permits I’ll dive further into depth predictions with GANs, showing that GANs only improve monocular depth estimation when the used image reconstruction loss is rather unconstraint.
Share this Post