Optimizing neural networks using structured probabilistic models
Neural networks have recently driven significant progress in machine learning applications as diverse as vision, speech, and text understanding. Despite much engineering effort to boost the computational efficiency of neural net training, most networks are still trained using variants of stochastic gradient descent. Natural gradient descent, a second-order optimization method, has the potential to speed up training by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large networks because it requires solving a linear system involving the Fisher matrix, whose dimension may be in the millions for modern neural network architectures. The key challenge is to develop approximations to the Fisher matrix which are efficiently invertible, yet accurately reflect its structure.
The Fisher matrix is the covariance of log-likelihood derivatives with respect to the weights of the network. I will present techniques to approximate the Fisher matrix using structured probabilistic models of the computation of these derivatives. Using probabilistic modeling assumptions motivated by the structure of the computation graph and empirical analysis of the distribution over derivatives, I derive approximations to the Fisher matrix which allow for efficient approximation of the natural gradient. The resulting optimization algorithm is invariant to some common reparameterizations of neural networks, suggesting that it automatically enjoys the computational benefits of these reparameterizations. I show that this method gives significant speedups in the training of two widely used
architectures: restricted Boltzmann machines and convolutional networks.
BIO: Roger Grosse is an Assistant Professor of Computer Science at the University of Toronto. He received his Ph.D. in computer science from MIT under the supervision of of Bill Freeman, and then spent two years as a postdoc at the University of Toronto. He is a recipient of the NDSEG Graduate Fellowship, the Banting Postdoctoral Fellowship, and outstanding paper awards at the International Conference of Machine Learning (ICML) and the Conference for Uncertainty in AI (UAI). He is also a co-creator of Metacademy, an open-source web site for developing personalized learning plans in machine learning and related fields.