Ensemble modeling is the process of running two or more related but different analytical models and then synthesizing the results into a single score or spread in order to improve the accuracy of predictive analytics and data mining applications.
In predictive modeling and other types of data analytics, a single model based on one data sample can have biases, high variability or outright inaccuracies that affect the reliability of its analytical findings. Using specific modeling techniques can present similar drawbacks. By combining different models or analyzing multiple samples, data scientists and other data analysts can reduce the effects of those limitations and provide better information to business decision makers.
One common example of ensemble modeling is a random forest model. This approach to data mining leverages multiple decision trees, a type of analytical model that’s designed to predict outcomes based on different variables and rules. A random forest model blends decision trees that may analyze different sample data, evaluate different factors or weight common variables differently. The results of the various decision trees are then either converted into a simple average or aggregated through further weighting.
Ensemble modeling has grown in popularity as more organizations have deployed the computing resources and advanced analytics software needed to run such models. In addition, the emergence of Hadoop and other big data technologies has led businesses to store and analyze greater volumes of data, creating increased potential for running analytical models on different data samples.