After starting my studies as a graduate student I explored different aspects of computer vision. Among possible tracks I chose to start working on video analysis and specifically human action recognition. Vision-based human action recognition is defined as the process of labeling image sequences with human action labels. In other words, it is the capability of automatically analyzing videos to detect and determine human actions.
This project shaped around the idea of redefining simple and straightforward methods used on image classification for applying to video files (image sequences). We designed 3D kernel descriptors for extracting discriminative features from depth sequences. Depth sequences consist of series of images (video) which are produced using depth sensors (Microsoft Kinect). These images provide a 3D representation of the scene which is not depended to different lighting conditions.
Our kernel descriptors are designed to capture discriminative information from these 3D representations and they showed great performance improvements in action recognition models. Our method outperformed all state-of-the-art approaches in field of human action recognition and it lead to a paper published in IEEE International Conference on Automatic Face and Gesture Recognition (FG).
Considering the expandability of our new approach we continued to work on the model and added the ability of working with two modalities (RGB and Depth sequences) to achieve even higher performance on challenging action recognition datasets. The new version showed 100% accuracy on one of the well known public datasets on action recognition. Our research results were published on the second paper in Elsevier journal of Computer Vision and Image Understanding (CVIU).
For more details and information on these research projects you can check out the published papers here: