Reputation: 323
I am going to use scikit-learn libraries for my SVM implementation for classification.
My features' values are 0/1 and I have saved these values in a txt file for features and a separate txt file for my labels.
Now my problem is that how I can load my external data set for training and test phase using scikit-learn?
Upvotes: 0
Views: 1572
Reputation: 210982
Saving vectorized and especially compressed (sparse) data in a TXT/CSV file is not the best approach as you might have problems when reading it back - you will lose dtypes, compression/"sparseness", etc.. You may even encounter cases when you will not be able to read your TXT/CSV file in memory.
Here you can see an example when converting sparse matrix to a normal (numpy) one ends with MemoryError
. It may happen to you if you will save your sparse (compressed) matrix to CSV and then will try to read it back (uncompressed).
So i would recommend you to use pickling:
saving / serializing your data:
from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')
where clf
is your trained model or another sparse/compressed data structure
reading it back from disk:
from sklearn.externals import joblib
clf = joblib.load('filename.pkl')
Upvotes: 2