Reputation: 6369

Memory allocation error in sklearn random forest classification python

I am trying to run sklearn random forest classification on 2,79,900 instances having 5 attributes and 1 class. But i am getting memory allocation error while trying to run the classification at the fit line, it is not able to train the classifier itself. Any suggestions on how to resolve this issue?

The data a is

x,y, day, week, Accuracy

x and y are the coordinates day is which day of the month (1-30) the week is which day of the week (1-7) and accuracy is an integer

code:

import csv
import numpy as np
from sklearn.ensemble import RandomForestClassifier


with open("time_data.csv", "rb") as infile:
    re1 = csv.reader(infile)
    result=[]
    ##next(reader, None)
    ##for row in reader:
    for row in re1:
        result.append(row[8])

    trainclass = result[:251900]
    testclass = result[251901:279953]


with open("time_data.csv", "rb") as infile:
    re = csv.reader(infile)
    coords = [(float(d[1]), float(d[2]), float(d[3]), float(d[4]), float(d[5])) for d in re if len(d) > 0]
    train = coords[:251900]
    test = coords[251901:279953]

print "Done splitting data into test and train data"

clf = RandomForestClassifier(n_estimators=500,max_features="log2", min_samples_split=3, min_samples_leaf=2)
clf.fit(train,trainclass)

print "Done training"
score = clf.score(test,testclass)
print "Done Testing"
print score

Error:

line 366, in fit
    builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
  File "sklearn/tree/_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
  File "sklearn/tree/_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
  File "sklearn/tree/_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 10206838784 bytes

Upvotes: 3

Answers (3)

Ankit

Reputation: 11

Please try Google Colaboratory. You can connect with the localhost or hosted runtime. It worked for me for n_estimators=10000.

Upvotes: 1

Zhendong Cao

Reputation: 169

I met with the same MemoryErr recently. But I fixed it by reducing the training data size instead of modifying my model parameters. My OOB value was 0.98 meaning that the model is very less likely overfit.

Upvotes: 0

jubueche

Reputation: 793

From the scikit-learn doc.: "The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values."

I would try to adjust these parameters then. Also, you can try a mem. profiler or try to run it on GoogleCollaborator if your machine has too few RAM.

Upvotes: 1

Memory allocation error in sklearn random forest classification python

Answers (3)

Related Questions