ximiki
ximiki

Reputation: 455

Python sklearn GaussianNB : "MemoryError" but no leads on how to fix

I am running the following code to create and fit a GaussianNB classifier:

features_train, features_test, labels_train, labels_test = preprocess()

### compute the accuracy of your Naive Bayes classifier
# import the sklearn module for GaussianNB 
from sklearn.naive_bayes import GaussianNB 
import numpy as np

### create classifier 
clf = GaussianNB()

### fit the classifier on the training features and labels    
clf.fit(features_train, labels_train)

Running the above locally:

>>> runfile('C:/.../naive_bayes')
no. of Chris training emails: 4406
no. of Sara training emails: 4383
>>> clf
GaussianNB()

I believe this checks out "preprocess()" because it loads features_train, features_test, labels_train, labels_test successfully.

When I try to clf.score or clf.predict, I get a MemoryError:

>>> clf.predict(features_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict
    jll = self._joint_log_likelihood(X)
  File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 343, in _joint_log_likelihood
    n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
MemoryError
>>> clf.score(features_test,labels_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\sklearn\base.py", line 295, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict
    jll = self._joint_log_likelihood(X)
  File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 343, in _joint_log_likelihood
    n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
MemoryError

I do not think it is a problem with my memory because I do not see a spike in RAM on my task manager, and not near the memory usage on my machine.

I suspect it is something with the Python version and the libraries versions.

Any help in going about diagnosing this is appreciated. I can provide more info as needed.

Upvotes: 1

Views: 1619

Answers (2)

Danilo Souza Mor&#227;es
Danilo Souza Mor&#227;es

Reputation: 1593

I'm also taking that same Udacity course and I had the same exact problem. I installed Anaconda 64bits and executed the script inside Spyder and everything worked out as expected

Upvotes: 0

ximiki
ximiki

Reputation: 455

I believe I answered my question after reading some related posts online (did not use previously answered Stackoverflow posts).

The key for me was to simply move to 64-bit Python via Anaconda. All issues with 'MemoryError' were resolved when the exact same code that was run in 32-bit Python was retried in 64-bit. To my best understanding, this was the only variable that was changed.

Perhaps this is not a very satisfying answer, but it would be nice if this question can remain for others in the future searching for the exact same sklearn MemoryError problem.

Upvotes: 1

Related Questions