Publications

Abstract: Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.


bibtex: @article{Gurram:2021MonoDEVSNet, title={Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision}, author={Gurram, Akhil and Tuna, Ahmet Faruk and Shen, Fengyi and Urfalioglu, Onay and L{\'o}pez, Antonio M}, journal={arXiv preprint arXiv:2103.12209}, year={2021}
}

Abstract: A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-toend autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g., LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-toend AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.

 

bibtex: @article{Xiao:2020, title={Multimodal end-to-end autonomous driving}, author={Xiao, Yi and Codevilla, Felipe and Gurram, Akhil and Urfalioglu, Onay and L{\'o}pez, Antonio M}, journal={IEEE Transactions on Intelligent Transportation Systems}, year={2020}, publisher={IEEE}
}

Abstract: Depth estimation provides essential information to perform autonomous driving and driver assistance. A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels where the same raw training data is associated with both types of ground truth, ie, depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, ie, that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on monocular depth estimation.

 

bibtex: @article{Gurram:2020, title={Semantic Monocular Depth Estimation Based on Artificial Intelligence}, author={Gurram, Akhil and Urfalioglu, Onay and Halfaoui, Ibrahim and Bouzaraa, Fahd and Lopez, Antonio M}, journal={IEEE Intelligent Transportation Systems Magazine}, year={2020}, publisher={IEEE}
}

Abstract:Depth estimation provides essential information to perform autonomous driving and driver assistance. Especially, Monocular Depth Estimation is interesting from a practical point of view, since using a single camera is cheaper than many other options and avoids the need for continuous calibration strategies as required by stereo-vision approaches. State-of-theart methods for Monocular Depth Estimation are based on Convolutional Neural Networks (CNNs). A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixelwise semantic labels, which usually are difficult to annotate (e.g. crowded urban images). Moreover, so far it is common practice to assume that the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming stateof-the-art results on Monocular Depth Estimation.

 
bibtex: @inproceedings{Gurram:2018,
  title={Monocular depth estimation by learning from heterogeneous datasets},
  author={Gurram, Akhil and Urfalioglu, Onay and Halfaoui, Ibrahim and Bouzaraa, Fahd and L{\'o}pez, Antonio M},
  booktitle={2018 IEEE Intelligent Vehicles Symposium (IV)},
  pages={2176--2181},
  year={2018},
  organization={IEEE} 
}

Patents

• Akhil Gurram, Onay Urfalioglu: Domain Adaptation based on self-supervised Depth and Relative-Pose Estimation. European Patent –
Pending/Filed-86937254.

• Onay Urfalioglu, Akhil Gurram, Ibrahim Halfaoui: Sampling-based Self-Supervised Depth and Pose Estimation. European Patent – Pending/Filed-86934297.

• Onay Urfalioglu, Akhil Gurram, Fahd Bouzaraa: Learnable Localization using a map with camera frames. European Patent – Pending/Filed.