Reputation: 455
I am running the following code to create and fit a GaussianNB classifier:
features_train, features_test, labels_train, labels_test = preprocess()
### compute the accuracy of your Naive Bayes classifier
# import the sklearn module for GaussianNB
from sklearn.naive_bayes import GaussianNB
import numpy as np
### create classifier
clf = GaussianNB()
### fit the classifier on the training features and labels
clf.fit(features_train, labels_train)
Running the above locally:
>>> runfile('C:/.../naive_bayes')
no. of Chris training emails: 4406
no. of Sara training emails: 4383
>>> clf
GaussianNB()
I believe this checks out "preprocess()" because it loads features_train, features_test, labels_train, labels_test successfully.
When I try to clf.score or clf.predict, I get a MemoryError:
>>> clf.predict(features_test)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict
jll = self._joint_log_likelihood(X)
File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 343, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
MemoryError
>>> clf.score(features_test,labels_test)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 295, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict
jll = self._joint_log_likelihood(X)
File "C:\Python27\lib\site-packages\sklearn\naive_bayes.py", line 343, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
MemoryError
I do not think it is a problem with my memory because I do not see a spike in RAM on my task manager, and not near the memory usage on my machine.
I suspect it is something with the Python version and the libraries versions.
Any help in going about diagnosing this is appreciated. I can provide more info as needed.
Upvotes: 1
Views: 1619
Reputation: 1593
I'm also taking that same Udacity course and I had the same exact problem. I installed Anaconda 64bits and executed the script inside Spyder and everything worked out as expected
Upvotes: 0
Reputation: 455
I believe I answered my question after reading some related posts online (did not use previously answered Stackoverflow posts).
The key for me was to simply move to 64-bit Python via Anaconda. All issues with 'MemoryError' were resolved when the exact same code that was run in 32-bit Python was retried in 64-bit. To my best understanding, this was the only variable that was changed.
Perhaps this is not a very satisfying answer, but it would be nice if this question can remain for others in the future searching for the exact same sklearn MemoryError problem.
Upvotes: 1