24 Oct 2019 09:30am to 10:30am

Understanding etiology and treatment effects in observational cohorts: Opportunities and challenges in leveraging –omics and machine learning

Event Location
University of Melbourne
CEB Room 515, 207 Bouverie St
Melbourne VIC 3053
Dr Jon Huang
Singapore Institute for Clinical Sciences

Big data. Multi-omics. Machine learning. Technology continues to increase our ability to both produce and summarize data on human health and function. Taking as a goal the production of knowledge about disease etiology or mechanisms for treatment effects in intensely followed observational cohorts, what are principled approaches to leverage these diverse resources and tools?

I propose that expanding upon a paradigm of target trials or hypothetical experiments may be a practical way forward. Using recent empirical and simulation work on the effects of in vitro fertilization on child growth and cardiometabolic health, I will discuss opportunities and practical challenges presented by causal mediation analyses and treatment effect estimation aided by machine learning (e.g. Targeted Maximum Likelihood Estimation). Opportunities include the use of epigenomic biomarkers which are hypothesized to have no effects (“negative control mediators”) to strengthen inference and efficiency gains from semiparametric estimation. Conceptual and analytic challenges include the definition of target populations from observational data and the possibility of increased bias and anticonservative standard error estimates when using nonparametric algorithms in the presence of practical positivity violations, common in real data.


Dr Jon Huang is an epidemiologist who leads the Biostatistics Platform at the Singapore Institute for Clinical Sciences. He completed an MPH and PhD at the University of Washington and postdoctoral work at McGill University, focusing mainly on birth cohorts, causal inference, and molecular epidemiology, before moving to Singapore in 2018. His current work involves leading and supporting a diverse range of projects including: maternal depressive symptom scores prediction via ensemble machine learning; estimating the role of maternal breast milk composition in mediating the effects of maternal dyslycemia on infant adiposity; and a multi-site peri-conception nutritional supplement RCT.