user785099
user785099

Reputation: 5563

MemoryError of running Randomforest in scikit-learn

I am following the Python example given in For Beginners - Bag of Words. However, the following code segment gives the error message like MemoryError. What might cause this error

forest = forest.fit(train_data_features, train["sentiment"])

Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo4.py", line 60, in <module>
   forest = forest.fit(train_data_features, train["sentiment"])
File "C:\Users\AppData\Roaming\Python\Python27\site-        
   packages\sklearn\ensemble\forest.py", line 195, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "C:\Users\AppData\Roaming\Python\Python27\site-
   packages\sklearn\utils\validation.py", line 341, in check_array
   array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError

Upvotes: 3

Views: 2728

Answers (2)

ant
ant

Reputation: 56

In the specified example the bag of words contains 5000 features; this requires significant memory. So, one solution is to reduce the number of features, but doing this may affect the model performance. Another solution is to switch from 32-bit Python to 64-bit.

Upvotes: 0

mata
mata

Reputation: 69042

MemoryError, as the name says, means you're running out of free memory.

If you're following the example code from here, then there are a few things that could help you:

  • delte variables using del when you don't need them anymore
    (for example, clean_train_reviews is not needed after line 62)
  • After line 42, only train["sentiment"] is needed, the rest of train could be discarded to free memory
  • don't read both the training and the test sets at the beginning. The test set is only needed after building the forest, and at that point nothing else related to the train set is needed anymore.
  • The whole training part could be wrapped in a function returning the forest, that would take care of all references which are no longer needed after that.

Upvotes: 4

Related Questions