Abstract
Motivated by practical needs such as large-scale learning, we study the impact of the adaptivity constraints to online learning and decision-making problems. Unlike traditional online learning problems which allow full adaptivity at the per-time-step scale, our work investigates the models where the learning strategy cannot frequently change and therefore enables the possibility of parallelization.
In this talk, I will focus on batch learning, a particular learning-with-limited-adaptivity model, and show that only O(log log T) batches are needed to achieve the optimal regret for the popular linear contextual bandit problem. Along the way in the proof, I will also introduce the distributional optimal design, a natural extension of the optimal experiment design in statistical learning, and introduce our statistically and computationally efficient learning algorithm for the distributional optimal design, which may be of independent interest.
Time
2021-06-18 16:30-17:00Speaker
Yuan Zhou, University of Illinois at Urbana-ChampaignRoom
Guangdong Hotel Shanghai