报告地点:腾讯会议 231-541-097
摘要: Reinforcement Learning (RL) has attracted considerable interest from both industry and academia during the past few years. The study of RL algorithms with provable rates of convergence, however, is still in its infancy. In this talk, we discuss some recent progresses that bridge RL with stochastic nonlinear programming. We pay special attention to online reinforcement learning, which intends to continuously improve the system performances in-situ. More specifically, we introduce a general class of policy mirror descent (PMD) methods and show that they achieve linear convergence for the deterministic case and optimal sampling complexity for the stochastic case for discounted Markov decision processes. We also show how the gradient information can be estimated efficiently online through a few recently proposed conditional temporal difference methods. Extensions of these algorithms for the average reward and block coordinate settings will also be discussed.
个人简介: Guanghui Lan is an A. Russell Chandler III professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Institute of Technology. Dr. Lan was on the faculty of the Department of Industrial and Systems Engineering at the University of Florida from 2009 to 2015, after earning his Ph.D. degree from Georgia Institute of Technology in August 2009. His main research interests lie in optimization and machine learning. The academic honors he received include the Mathematical Optimization Society Tucker Prize Finalist (2012), INFORMS Junior Faculty Interest Group Paper Competition First Place (2012) and the National Science Foundation CAREER Award (2013). Dr. Lan serves as an associate editor for Mathematical Programming, SIAM Journal on Optimization and Computational Optimization and Applications. He is also an associate director of the Center for Machine Learning at Georgia Tech.