MUHAMMAD USMAN GHANI KHAN received the Ph.D. degree from Sheffield University, U.K. His recent work is concerned with statistical modeling for machine vision signals, specifically language descriptions of video streams. He is currently an Associate Professor with the Department of Computer Science & Engineering, University of Engineering & Technology at Lahore, Lahore, Pakistan. He is also leading National Center for Artificial Intelligence, under Al-Khwarizmi Institute of Computer Science, UET Lahore.
Dr Usman Ghani has over 15 years of research experience speciﬁcally in the areas of image processing, computer vision, bioinformatics, medical imaging, computational linguistics and machine learning. Director of Intelligent Criminology lab under the Center of Artiﬁcial Intelligence. Director and founder of ﬁve research labs including, Computer vision & Machine learning Lab, Bioinformatics Lab, Virtual Reality & Gaming Lab, Data Science Lab, and Software Systems Research Lab. A well-groomed teacher and mentor for subjects related to Artiﬁcial Intelligence, Machine Learning and Deep Learning. Recorded freely available video lectures on youtube for courses of Bioinformatics, Image Processing, Data Mining & Data Science and Computer Programming.
Title of Talk: Automatic Surveillance System for Video Streams
Digital images and videos collection has increased exponentially during the past few years as more and more data is available in the form of personal photo albums, handheld camera videos, feature films and multi-lingual broadcast news videos which presents visual data ranging from unstructured to highly structured. Today video traffic accounts for 80 percent of all video traffic. Videos consist of audio and visual contents and are often provided with textual information resulting in the increase of data in all the three dimensions. Because of this huge increase in data, there is a need for qualitative filtering to differentiate between relevant and irrelevant information according to user requirements. In addition, time constraints enforce a limit on how much time one can spend watching videos. Therefore, one has to be selective when accessing appropriate information. Such a distillation process requires comprehensive information processing including categorization, description and explanation about various videos.
This talk is concerned with the automatic generation of natural language descriptions that can be used for video indexing, retrieval and summarization applications. It is a step ahead of keyword based tagging as it captures relations between keywords associated with videos, thus clarifying the context between them. Initially, we prepare hand annotations consisting of descriptions for video segments crafted from a self- generated and state of the art Video dataset. Analysis of this data presents insights into humans interests on video contents. For machine generated descriptions, conventional image processing techniques are applied to extract high level features (HLFs) from individual video frames. Natural language description is then produced based on
these HLFs. Although feature extraction processes are erroneous at various levels, approaches are explored to put them together for producing coherent descriptions. For scalability purpose, application of framework to several different video genres is also discussed. For complete video sequences, a scheme to generate coherent and compact descriptions for video streams is presented which makes use of spatial and temporal relations between HLFs and individual frames respectively. Calculating overlap between machine generated and human annotated descriptions concludes that machine generated descriptions capture context information and are in accordance
with human’s watching capabilities. Further, a task based evaluation shows improvement in video identification task as compared to keywords alone. Finally, application of generated natural language descriptions, for video scene classification is discussed.