Statistics & Data Science Seminar Series Presents: Distinguished Speaker Xihong Lin

https://sds.wustl.edu/xml/events/13485/rss.xml
28726
Statistics & Data Science Seminar Series Presents: Distinguished Speaker Xihong Lin

Statistics & Data Science Seminar Series Presents: Distinguished Speaker Xihong Lin

Xihong Lin, Professor of Biostatistics and Chair and Professor of Statistics at Harvard University

Harnessing Synthetic Data from Generative AI for Statistical Inference

Integration of statistics and generative AI plays a pivotal role for accelerating trustworthy cross-domain scientific discovery. Recent advances in generative models have dramatically increased the availability and use of synthetic data across scientific domains. While these developments create exciting opportunities for empowering data analysis, they also raise fundamental statistical challenges regarding how synthetic data can be used in a valid, reliable, and principled manner. In this talk, we first discuss the current landscape of synthetic data generation using generative AI models such as transformer- and diffusion- based models. More importantly, we present a principled framework for incorporating synthetic data in downstream statistical analysis that ensures valid statistical inference even when generative AI models are misspecified. We show that the proposed synthetic data assisted methods integrating observed and synthetic data are robust to misspecified black-box generative models and can improve statistical inferential power when the generative AI models are informative. We demonstrate the utility of these synthetic data assisted methods to the analysis of the UK biobank data, by performing genome-wide association studies (GWAS) of proteomic data and whole-genome sequencing (WGS) analyses of brain imaging phenotypes, both characterized by substantial missingness (about 90%).

Dr. Xihong Lin is a world-renowned statistician and a leading figure in biostatistics and quantitative genomics. She is a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health, and currently serves as Chair of the Department of Statistics at Harvard University.
Dr. Lin’s research interests lie in the development and application of scalable statistical and machine learning/AI methods for the analysis of massive and complex genetic and genomic, epidemiological and health data. Dr. Lin was elected to the National Academy of Medicine in 2018 and the National Academy of Sciences in 2023. She received the 2002 Mortimer Spiegelman Award and the 2025 Lowell Reed Lecture Award from the American Public Health Association, the 2006 Committee of Presidents of Statistical Societies (COPSS) Presidents’ Award, the 2017 COPSS FN David Award, the 2008 Janet L. Norwood Award for Outstanding Achievement of a Woman in Statistics, the 2022 National Institute of Statistical Sciences Jerome Sacks Award for Outstanding Cross-Disciplinary Research, and the 2022 Marvin Zelen Leadership in Statistical Science Award. She is an elected fellow of American Statistical Association (ASA), Institute of Mathematical Statistics, International Statistical Institute, and the American Association of the Advancement of Sciences

Host: Ran Chen

A reception will follow the talk in the Weidenbaum Suite, located in Seigle Hall 170.