Reputation: 2789
I am trying to train (fit) a Random forest classifier using python and scikit-learn for a set of data stored as feature vectors. I can read the data, but I can't run the training of the classifier because of Value Erros. The source code that I am using is the following:
from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt
my_training_data = genfromtxt('csv-data.txt', delimiter=',')
X_train = my_training_data[:,0]
Y_train = my_training_data[:,1:my_training_data.shape[1]]
clf = RandomForestClassifier(n_estimators=50)
clf = clf.fit(X_train.tolist(), Y_train.tolist())
The error returned to me is the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 260, in fit
n_samples, self.n_features_ = X.shape
ValueError: need more than 1 value to unpack
The csv-data.txt is a comma separated values file, containing 3996 vectors for training of the classifier. I use the f irst dimension to label the vector and the rest are float values. These are the dimensions of the feature vectors used in the classifier.
Did I miss some conversion here?
Upvotes: 0
Views: 3352
Reputation: 4039
The training examples are stored by row in "csv-data.txt"
with the first number of each row containing the class label. Therefore you should have:
X_train = my_training_data[:,1:]
Y_train = my_training_data[:,0]
Note that in the second index in X_train
, you can leave off the end index, and the indices will automatically run to the end (of course you can be explicit for clarity, but this is just FYI.
Also, there is no need to call tolist()
in your call to fit()
since these are already numpy
ndarray
, and the fit()
function will convert them back to numpy
ndarray
if the argument is a list.
clf.fit(X_train.tolist(), Y_train.tolist())
Upvotes: 3