C.П. Чистяков.
Случайные леса: обзор
// Труды КарНЦ РАН. No 1. Сер. Математическое моделирование и информационные технологии. Вып. 4. 2013. C. 117-136
S.P. Chistiakov. Random forests: an overview // Transactions of Karelian Research Centre of Russian Academy of Science. No 1. Mathematical Modeling and Information Technologies. Vol. 4. 2013. Pp. 117-136
Keywords: decision trees, classifier ensembles, bagging, random forests, classification, regression, clustering, R package.
This paper presents an overview of the state-of-the-art in the studies of random forests — a statistical method designed to deal with problems of classification and regression. We tell about the history of decision trees and classifier ensembles and describe the corresponding basic ideas (impurity, split, bagging, boosting, etc.). Someissuesoftheconsistencyofthemethodareconsidered.Applicabilityofrandom forests to the problems of finding most informative features, clustering, finding outlier observations and class prototypes is surveyed. Several non-classical variants of decision trees and random forests is considered, namely: oblique trees, survival random forests, quantile regression forests, logical random forests, probabilistic random forests and streaming random forests. We also survey the corresponding software with the emphasis on R package — open source environment for statistical computing and graphics which is freely available for the computing platforms Linux, Windows, Mackintosh.