Saturday, October 16, 2021 3pm to 4:15am
About this Event
Please register to attend the EE Seminars via Zoom this fall. Note, this is a new link from last year. Once you register, you will receive the Zoom link for all 2021-22 EE Seminar events. You only have to register once to receive a recurring Zoom link for the series. Please click "Add to calendar '' after signing up for Google or Outlook reminders!
Multiagent reinforcement learning systems have recently achieved significant success in many AI challenges. Two crucial components that contribute to these successes are function approximation and self-play. Function approximation is commonly deployed in modern applications with large state space to approximate either the value function or the policy, while self-play enables the learner to improve by playing against itself instead of human experts. While recent progresses in RL theory address a rich set of RL problems, such successes are mostly restricted to the single-agent setting with limited classes of function approximation. This talk presents our progress on addressing both challenges. We first propose a new algorithm that can provably find a near-optimal policy using a polynomial number of samples, for any single-agent RL problems with low Bellman-Eluder dimension---a generic complexity measure which captures a majority of existing tractable RL problems. We then extend the results to the setting of two-player zero-sum Markov games, propose a new self-play algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples. A key component of our self-play algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness.