How load a data set contains txt file in scikit-learn

Question

I am going to use scikit-learn libraries for my SVM implementation for classification.

My features' values are 0/1 and I have saved these values in a txt file for features and a separate txt file for my labels.

Now my problem is that how I can load my external data set for training and test phase using scikit-learn?

MaxU - stand with Ukraine · Accepted Answer

Saving vectorized and especially compressed (sparse) data in a TXT/CSV file is not the best approach as you might have problems when reading it back - you will lose dtypes, compression/"sparseness", etc.. You may even encounter cases when you will not be able to read your TXT/CSV file in memory.

Here you can see an example when converting sparse matrix to a normal (numpy) one ends with MemoryError. It may happen to you if you will save your sparse (compressed) matrix to CSV and then will try to read it back (uncompressed).

So i would recommend you to use pickling:

saving / serializing your data:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')

where clf is your trained model or another sparse/compressed data structure

reading it back from disk:

from sklearn.externals import joblib
clf = joblib.load('filename.pkl')

How load a data set contains txt file in scikit-learn

Answers (1)

Related Questions