Fall meeting 2019: Deep Vision
The NVPHBV fall meeting will be held at Wednesday 27th of November 2019 at the University of Amsterdam.
Topic of the meeting will be “Deep Vision”.
Venue location will be the Turingzaal at the Centrum Wiskunde & Informatica (CWI), Science Park 123, Amsterdam.
Hourly Schedule
Program
- 11:30 - 12:00
- Welcome
- walk in with coffee & tea
- 12:00 - 13:00
- Lunch
- 13:00 - 13:40
- Localizing concepts, the few-shot way
-
Speakers:
Prof. dr. Cees Snoek
- 13:40 - 14:20
- Deep learning in Ophthalmology: Saving vision
-
Speakers:
Prof. Dr. Clarissa Sánchez
- 14:20 - 15:00
- Depth for (and from) Convolutional Neural Networks
-
Speakers:
Dr. Thomas Mensink
- 15:00 - 15:30
- Break, coffee & tea
- 15:30 - 16:10
- Driver Handheld Cell Phone Use Detection
-
Speakers:
Dr. ir. Ronald Poppe
- 16:10 - 16:50
- Neurostimulation and pattern recognition in personalised medical intervention for enhancement of cognition and visual perception
-
Speakers:
Prof. Raymond van Ee
- 16:50 - 17:10
- Introduction to computer vision-based body language recognition
-
Speakers:
Dr. Harro Stokman
- 17:10 - 17:30
- Deep Learning: the future of warehouses
-
Speakers:
Ir. Enrico Liscio
- 17:30 - 18:30
- Drinks and networking
Learning to recognize concepts in image and video has witnessed phenomenal progress thanks to improved convolutional networks, more efficient graphical computers and huge amounts of image annotations. Even when image annotations are scarce, classifying objects and activities has proven more than feasible. However, for the localization of objects and activities, existing deep vision algorithms are still very much dependent on many hard to obtain image annotations at the box or pixel-level. In this talk, I will present recent progress of my team in localizing objects and activities when box- and pixel-annotations are scarce or completely absent. I will also present a new object localization task along this research direction. Given a few weakly-supervised support images, we localize the common object in the query image without any box annotation. Finally, I will present recent results on spatio-temporal activity localization when no annotated box, nor tube, examples are available for training.
All state of the art image classification, recognition and segmentation models use convolutions. These (mostly) have a fixed spatial extend in the image plane, by using filters of 3×3 pixels. In this talk I will argue that convolutions should have a fixed spatial extend in the real world, in the XYZ space. We introduce a novel convolutional operator using RGB + depth as input, which yields (approximately) fixed size filters in the real world. We exploit these for image segmentation, and also show that our method is beneficial when we use D inferred from RGB, and then use our proposed RGB-D Neighbourhood Convolution. If time permits I’ll dive further into depth predictions with GANs, showing that GANs only improve monocular depth estimation when the used image reconstruction loss is rather unconstraint.
Many road accidents are attributed to in-car phone use. Currently, drivers can only be fined if they are caught red-handed. In anticipation of changing legislation to allow for automated fining, we address developing computer vision detection algorithms for this task. In this talk, we discuss the technical challenges in terms of the limited amount of labeled data, low image quality and the ambiguous nature of the footage. Instead of pursuing a pure deep learning approach, we rely on domain knowledge to deal with these challenges. We show results, as well as insights into the inner workings of our approach.
Current medical treatment, including neurostimulation, is based upon a one-size-fits-all approach. Recent findings now contribute to groundwork for non-pharmacological interventions by providing novel opportunities for individual neurostimulation to forcefully tap into the residual potential of the brain. Here I present approaches of neurostimulation and pattern recognition for enhancement of cognition and visual perception. I will further discuss new approaches in deep learning for pattern recognition in behaviour and brain activity.
An exciting new field in computer vision is recognition of the nuances of nonverbal communication between individuals. In this talk, we'll dive into how body language recognition is different from sign language recognition or visual command language. Next, what is needed to make body language recognition work is discussed. Finally, an overview is given of applications, such as video-based customer support, social interaction with robots, public security, and retail.
The logistics end e-commerce sectors are rapidly growing, demanding more and more automation to meet the increasing requests. The main challenge is represented by the large variability present in the warehouses, where a single robotic cell must be able to deal with hundreds of thousands of different products. Deep learning candidates itself as the perfect solution, thanks to its ability to generalize from a sub-section of the dataset. Fizyr has successfully developed and integrated a deep learning vision solution capable to help robotic integrators to handle such a large variation of goods. In this presentation, an overview of Fizyr’s solution is presented and advantages and challenges resulting from the use of deep learning in this industrial application are introduced, focusing on aspects such as scalability and reliability.
Speakers
-
Ir. Enrico LiscioDeep Learning Developer at Fizyr, Delft
Deep Learning: the future of warehouses
The logistics end e-commerce sectors are rapidly growing, demanding more and more automation to meet the increasing requests. The main challenge is represented by the large variability present in the warehouses, where a single robotic cell must be able to deal with hundreds of thousands of different products. Deep learning candidates itself as the perfect solution, thanks to its ability to generalize from a sub-section of the dataset. Fizyr has successfully developed and integrated a deep learning vision solution capable to help robotic integrators to handle such a large variation of goods. In this presentation, an overview of Fizyr’s solution is presented and advantages and challenges resulting from the use of deep learning in this industrial application are introduced, focusing on aspects such as scalability and reliability.
-
Prof. Dr. Clarissa Sánchez Full Professor of AI and Health at University of Amsterdam/ Amsterdam UMCProf. Dr. Clarissa SánchezFull Professor of AI and Health at University of Amsterdam/ Amsterdam UMC
-
Prof. dr. Cees SnoekUniversity of Amsterdam
LOCALIZING CONCEPTS, THE FEW-SHOT WAY
Learning to recognize concepts in image and video has witnessed phenomenal progress thanks to improved convolutional networks, more efficient graphical computers and huge amounts of image annotations. Even when image annotations are scarce, classifying objects and activities has proven more than feasible. However, for the localization of objects and activities, existing deep vision algorithms are still very much dependent on many hard to obtain image annotations at the box or pixel-level. In this talk, I will present recent progress of my team in localizing objects and activities when box- and pixel-annotations are scarce or completely absent. I will also present a new object localization task along this research direction. Given a few weakly-supervised support images, we localize the common object in the query image without any box annotation. Finally, I will present recent results on spatio-temporal activity localization when no annotated box, nor tube, examples are available for training.
-
Dr. Thomas MensinkGoogle Research/Associate. Professor at University of Amsterdam
Depth for (and from) Convolutional Neural Networks
All state of the art image classification, recognition and segmentation models use convolutions. These (mostly) have a fixed spatial extend in the image plane, by using filters of 3×3 pixels. In this talk I will argue that convolutions should have a fixed spatial extend in the real world, in the XYZ space. We introduce a novel convolutional operator using RGB + depth as input, which yields (approximately) fixed size filters in the real world. We exploit these for image segmentation, and also show that our method is beneficial when we use D inferred from RGB, and then use our proposed RGB-D Neighbourhood Convolution. If time permits I’ll dive further into depth predictions with GANs, showing that GANs only improve monocular depth estimation when the used image reconstruction loss is rather unconstraint.
-
Dr. ir. Ronald PoppeUniversity of Utrecht
Driver Handheld Cell Phone Use Detection
Many road accidents are attributed to in-car phone use. Currently, drivers can only be fined if they are caught red-handed. In anticipation of changing legislation to allow for automated fining, we address developing computer vision detection algorithms for this task. In this talk, we discuss the technical challenges in terms of the limited amount of labeled data, low image quality and the ambiguous nature of the footage. Instead of pursuing a pure deep learning approach, we rely on domain knowledge to deal with these challenges. We show results, as well as insights into the inner workings of our approach.
-
Prof. Raymond van EeLeuven University/ Radboud University, Nijmegen /Philips Research, Eindhoven
NEUROSTIMULATION AND PATTERN RECOGNITION IN PERSONALISED MEDICAL INTERVENTION FOR ENHANCEMENT OF COGNITION AND VISUAL PERCEPTION
Current medical treatment, including neurostimulation, is based upon a one-size-fits-all approach. Recent findings now contribute to groundwork for non-pharmacological interventions by providing novel opportunities for individual neurostimulation to forcefully tap into the residual potential of the brain. Here I present approaches of neurostimulation and pattern recognition for enhancement of cognition and visual perception. I will further discuss new approaches in deep learning for pattern recognition in behaviour and brain activity.
-
Dr. Harro StokmanCEO Kepler Vision Technologies
Introduction to computer vision-based body language recognition
An exciting new field in computer vision is recognition of the nuances of nonverbal communication between individuals. In this talk, we’ll dive into how body language recognition is different from sign language recognition or visual command language. Next, what is needed to make body language recognition work is discussed. Finally, an overview is given of applications, such as video-based customer support, social interaction with robots, public security, and retail.