Résumé: In the first part of this talk, I will present the main ideas underlying the PAC-Bayesian learning theory – which provides statistical guarantees on the generalization of a weighted majority vote of many classifiers – using a simplified approach. This approach leads to a general theorem that embraces several existing PAC-Bayesian results. Moreover, our proof process eases the "customization" of PAC-Bayesian theorems. In particular, I will show how this result can be extended to the transductive framework, and to theorems where the common Kullback-Leibler divergence (which, in usual PAC-Bayesian theorems, quantifies the "distance" between a prior and a posterior distribution over the voting classifiers) is replaced by the Rényi divergence. In the second part, I will exhibit a strong link between frequentist PAC-Bayesian bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, the minimization of PAC-Bayesian generalization bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam’s razor criteria.
Note: La présentation sera donnée en français.