Matrix Power Normalized Covariance Pooling For Deep Convolutional Networks

Overview

By stacking deeper layers of convolutions and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate the complex boundaries of thousands of classes in a large-dimensional space, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order.

For making use of second-order statistics, we propose Matrix Power Normalized Covariance pooling (MPN-COV) ConvNets [MPN-COV_ICCV17], instead of global average pooling. MPN-COV amounts to robust covariance estimation given a small number of large-dimensional features [RAID-G-CVPR16], as commonly seen in the last convolutional layers in state-of-the-art ConvNets. It approximately exploits Riemannian geometry while overcoming the downside of the well-known Log-Euclidean metric. However, MPN-COV computes forward and backward propagations via GPU-unfriendly EIGen-decomposition, limiting its efficiency. Therefore, we present fast MPN-COV (i.e., iSQRT-COV) which computes matrix square root normalization by iterative algorithm involving only matrix multiplications [Fast_MPN-COV_CVPR18]. The fast MPN-COV is very efficient, scalable to multiple-GPU configuration, while enjoying matching performance with MPN-COV. Based on fast MPN-COV ConvNets, we achieve 1st place (1/59) in large-scale, fine-grained iNaturalist 2018 Challenge spanning 8000 species at FGVC5 CVPR 2018.

Besides global covariance pooling, we also present global Gaussian pooling which summarizes feature distribution of one image as a Gaussian [G2DeNet_CVPR17]. By embedding Gaussian as a square root of a positive definite matrix of mean vector and covariance matrix based on information geometry [L2EMG_PAMI17], the Gaussian distribution can be plugged into deep ConvNets for end-to-end training.  

 

References

[MPN-COV_ICCV17] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? IEEE Int. Conf. on Computer Vision (ICCV), pp. 2070-2078, 2017.

[Fast_MPN-COV_CVPR18] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 947-955, 2018.

[RAID-G_CVPR16] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Materiel Recognition. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4433-4441, 2016. 

[G2DeNet_CVPR17] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2730-2739, 2017. (Oral presentation)

[L2EMG_PAMI17] Peihua Li, Qilong Wang, Hui Zeng, Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 39(4): 803-817, 2017. 

 


Implementation


Implementation of [MPN-COV_ICCV17]

We compute matrix power via EIGen-decompostion via methodology of matrix back-propagation. Through EIG, matrix power is transformed to power of eigenvalues. We implement MPN-COV with MatConvNet package, and the source code is available at https://github.com/jiangtaoxie/MPN-COV-ConvNet.


Implementation of [Fast_MPN-COV_CVPR18]

We propose a fast MPN-COV method for computing matrix square root. The key is directed acyclic graph with iteration, where pre-normalization guarantees convergence of follow-up Newton-Schulz formula, while post-compensation recovers data scale induced by pre-norm. We implement fast MPN-COV with several deep learning framework, e.g., PyTorch, TensorFlow and MatConvNet. The source code is available at https://github.com/jiangtaoxie/fast-MPN-COV.