On Trees, Forests and Machines -- or: Do new Brooms Clean Better?

6 Feb 2018 04:00pm to 04:45pm

On Trees, Forests and Machines -- or: Do new Brooms Clean Better?

Seminar

Event Location

University of Melbourne

Evan Williams Theatre G03, Peter Hall Building 160

Melbourne VIC 3053

Australia

Speakers

Professor Andreas Ziegler

Institute of Medical Biometry and Statistics, University of Luebeck

Classical regression models, such as linear or logistic regression are the standard approach in biostatistics.

In the past decade the statistical properties of several machine learning approaches, such as random forests or support vector machines have been better understood. For example, for random forests there are results available on consistency, convergence rates and asymptotic normality. However, machine learning approaches will only be used if the approaches are available in simple to use and fast implementations. In this presentation, I will focus on random forests as learning machine. In the part of the presentation, I will intuitively introduce classification trees and probability estimation trees. Trees will next be generalized to random forests. The statistical properties of random forests are sketched. A specific problem in machine learning is how probability estimates should be updated to make predictions for other centers or for different time points. In the second part of the presentation I will show that both a general approach by Elkan and a novel approach specifically developed for random forests can be used for calibrating probability estimates. The approach will be illustrated by use of data from the German Stroke Study Collaboration.