Institute for Theoretical Computer Science, SUFE 上海财经大学理论计算机科学研究中心

We first introduce the classic multi-armed bandit model, and survey some recent results for studying the optimal sample complexity for the pure exploration multi-armed bandit problems. I will spend most of my time to talk about the combinatorial pure exploration (CPE) problem, which is a very general problem in the stochastic multi-armed bandit model. In a CPE instance, we are given n stochastic arms with unknown reward distributions, as well as a family F of feasible subsets over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify the feasible subset in F with the maximum total weight, using as few samples as possible.

We first show a nearly optimal sampling algorithm for CPE when F is a matroid. This special case already generalizes the classical best arm identification problem and the top-k arm identification problem, both of which have attracted significant attention in recent years. Central to our algorithm is Karger’s sample-and-prune technique, which was originally developed for the celebrated randomized linear time algorithm for MST.

For the general CPE, we provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of ln|F|. For an important class of combinatorial families (including spanning trees, matchings, and path constraints), we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and the notion of approximate Pareto curves in multi-objective optimization (note that |F| can be exponential in n). We also show that the ln|F| factor is inevitable in general, through a nontrivial lower bound construction utilizing a combinatorial structure resembling the Nisan-Wigderson design. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general CPE problem.

We further introduce an even more general problem, formulated in geometric terms, which generalizes most pure exploration bandit problems studied in the literature. It is closely related to active hypothesis testing and sequential experimental design. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.

On optimal sampling complexity for combinatorial multi-armed bandits (Jian Li)