Open Problem in Optimization

The rise of adaptive gradient methods has been a tremendous success. How can we explain their performance gains compared to vanilla stochastic gradient descent? Some theoretical arguments point to ineffectiveness, but their prevalence in practice is undisputed.

In this open problem, jointly written with Xinyi Chen, we lay out a possible direction to explain the performance improvements of adaptive gradient methods. If succecssful, a solution to the open problem can explain improvements in training of deep neural networks. The convergence speed vs. that of stochastic gradient descent can be up to square root of the number of parameters in the neural net which is being trained.

The Prize

A prize of 500$ will be given to the successful solver/s, for either proof or disproof of the conjecture.