Vendredi 21 avril 2023
Aiming for Generalization, Efficiency and Interpretability in Machine Learning for Speech and Audio
Cem Subakan
Professeur au département d’informatique et de génie logiciel
Heure: 13h30
Local: PLT-2501
Visioconférence: Zoom
Résumé: In recent years, machine learning models for speech and audio made significant strides with the proliferation of deep learning. However, there still remain interesting areas to work on regarding improving generalization on out-of-domain data, learning efficiency, and interpretability. In this talk, my goal is to offer samplers from each of these sub-aims to introduce my recent and ongoing research.
I will first of all talk about our work on speech separation where we try to measure the speech separation performance under real-life conditions. I will then move on to talk about our recent works on continual representation learning where we aim to continually train an audio encoder with minimal continual learning interventions, and cross-modal representation learning where we aim to increase cross-modal encoder performance by using unpaired text and audio data. Finally, I will introduce our recent efforts on developing neural network interpretation methods to reconstruct audio data to understand audio domain classifiers. I will finish the talk by briefly describing our two audio domain healthcare applications: Machine Learning for Infant Cry Analysis and Machine Learning for Speech and Language Disorders.
Biographie: Cem is an Assistant Professor at Université Laval in the Computer Science and Software Engineering department. He is also currently an Affiliate Assistant Professor in Concordia University Computer Science and Software Engineering Department, and an invited researcher at Mila-Québec AI Institute. He received his PhD in Computer Science from University of Illinois at Urbana-Champaign (UIUC), and did a postdoc in Mila-Québec AI Institute and Université de Sherbrooke. He serves as reviewer in several conferences including Neurips, ICML, ICLR, ICASSP, MLSP and journals such as IEEE Signal Processing Letters (SPL), IEEE Transactions on Audio, Speech, and Language Processing (TASL), IEEE Pattern Analysis and Machine Intelligence (PAMI). His research interests include Deep learning for Source Separation and Speech Enhancement under realistic conditions, Neural Network Interpretability, and Latent Variable Modeling. He is a recipient of the best paper award in the 2017 version of IEEE Machine Learning for Signal Processing Conference (MLSP), as well as the Sabura Muroga Fellowship from the UIUC CS department. He’s a core contributor to the SpeechBrain project, leading the speech separation part.
Note: La présentation sera donnée en anglais.