Clustering High Dimensional Dynamic Data Streams (Lin Yang)

Abstract

We present a data streaming algorithms for the k-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1,2,…Δ}^d. Our algorithms use k/eps^2 * poly(d log Δ) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the k-median cost of the coreset (1+eps)-approximates the cost of the streamed point set. We can use this coreset to compute a (1+eps)-approximation for the k-median problem by any efficient offline k-median algorithm. This is the first algorithm achieving space polynomial in d in this setting.

Time

2018-08-09   14:00 ~ 15:00   

Speaker

Lin Yang, Princeton university

Room

Room 602,School of Information Management & Engineering, Shanghai University of Finance & Economics