Research

Neural Architecture Search

Neural Architecture Search (NAS) is an emerging technique to automatically design neural network structures. We designed a fast NAS method for object detection. The discovered architecture surpasses state-of-the-art object detection models.

Vision+Language Tasks

Vision and Language Tasks like image captioning, visual question answering require the understanding of both vision and language information. We are particularly interested in bridging vision, language and knowledge. Several papers were published in TPAMI and CVPR.

Instance-level Recognition and Re-identification

Instance-level Recognition and Re-identification Recognizing object instances of the same category (such as face, person, car) is challenging due to the large intra-instance variation and small inter-instance variation. We constructed a large dataset for vehicle re-identification from aerial view and were top-ranked in related AI competitions.

Reading Text from Images

Text in natural scene images contains rich semantic information that is crucial for visual understanding and reasoning. Although OCR has been studied extensively, reading irregular text of arbitrary shape is still a challenging task. Some of our work was published in ICCV and AAAI.

Semidefinite Programming in CV

A variety of CV problems can be formulated as Binary Quadratic Problems (BQP), such as segmentation, image registration/matching and image denoising/restoration. We developed several fast and accurate BQP solvers based on semidefinite programming (SDP) relaxation. Details can be found in our publications in TPAMI, IJCV and CVPR.

Boosting Algorithms for Object Detection

Several boosting algorithms were designed specifically for fast object detection. Our studies on boosting algorithms were from a new perspective, i.e., the dual formation of boosting algorithms.