Résumé: Transfer-Learning aims to use data from a ‘source’ distribution S, towards improving learning performance w.r.t. a target distribution T. One can for instance think of having a large database of images from the web (the source) to be used towards classifying images from a specific domain, say traffic in New York (the target). Much research effort has therefore gone into understanding which relations between T and S allow information-transfer between T and S in such prediction tasks, and more practically, in understanding the relative benefits of source and target data.
In this talk we consider ‘nonparametric’ settings, i.e., we make little assumptions about S and T, and instead aim to characterize (S,T) so as to capture a continuum from easy to hard transfer problems, and furthermore to tightly capture the relative benefits of source and target data.
In the first part of the talk, we will discuss nonparametric insights on importance sampling methods – a very common algorithmic approach to transfer, involving the estimation of density-ratios, and show that these methods can greatly benefit from structured data (such as manifold or sparse data), attesting to their practical success, however might be solving too-hard a problem in some situations when transfer is easy, and are ill-defined in common situations where transfer is hard but still possible at reasonable rates.
In the second part of the talk, I will argue that an asymmetric notion of relative dimension between S and T, tightly captures the minimax rates of transfer. In particular, this notion reveals a rich set of situations where transfer is possible even at fast rates, even though traditional measures of divergence between S and T might remain large or undefined. Surprisingly, our minimax analysis shows that unlabeled target data have minor benefits in transfer, while few labeled target data can greatly improve the rates of transfer. Furthermore, we are able to characterize sharp thresholds at which the benefits of source data saturates given available target data. Finally, we show that such a threshold (a priori unknown) can nearly be achieved by an adaptive sampling procedure with no knowledge of distributional parameters.
The talk is partly based on recent work with Guillaume Martinet.
Biographie : Samory Kpotufe is Assistant Professor at ORFE, Princeton University, and works at the intersection of Machine Learning and modern Statistical areas such as Nonparametrics and High-dimensional Inference. He obtained his PhD in 2010 in Computer Science at UC San Diego, followed by postdoctoral research positions at the MPI for Intelligent Systems, and the Toyota Technological Institute at Chicago.
Note: La présentation sera donnée en anglais.