R: Random Forest Model for Regression and Classification

Random forest is very very important tools used in the prediction technology. Random Forests are one of the most popular and most powerful supervised Machine Learning algorithms in Machine Language.

Advantage of random forest

It mainly works on classification
Both classification and regression task are evaluated
Easily handle the missing values and maintain accuracy for missing data
Wont overfit the model
Handle large dataset with higher accuracy

Random Forest Pseudo code

Assume number of cases in the training set is N. then sample of these N cases is taken at random but with replacement
If there are M input variables or features, a number m<M is specified such that each mode, m variables are selected at random out of the M. The best split on these m is used to split the node. The value of m is held constant while we grow the forest.
Each tree is grown to the largest extent possible and there is no pruning.
Predicting new data by aggregating the predictions of the n trees (Majority vote for classification and average for regression).

The more trees in the forest the more robust the prediction. In the same way in the random forest classifier, the higher the number of trees in the forest gives the high accuracy results. To model multiple decision trees to create the forest you are not going to use the same method of constructing the decision with information gain or gini index approach, amongst other algorithms. In the random forest approach, a large number of decision trees are created. Every observation is fed into every decision tree. The most common outcome for each observation is used as the final output. A new observation is fed into all the trees and taking a majority vote for each classification model.

An error estimate is made for the cases which were not used while building the tree. That is called an OOB (Out-of-bag) error estimate which is mentioned as a percentage.

The R package “random Forest” is used to create random forests.

R: Random Forest Model for Regression

Random Forest Model for Regression is a bagged decision tree modification that creates a wide collection of de-correlated trees to increase predictive performance. Join Durga Online Trainer Institute and lean complete Data Science.

Random Forest in R

Submit a Comment Cancel reply

Recent Posts

Recent Comments

+91 -77559 10537

[email protected]

Company

Top Courses

Subscribe

Success!

Follow Us