Reputation: 6464
I have the training data in a CSV file whose first element is the result and the rest of the elements make the feature vector.
I was using Weka to train and test various algorithms on this training data. But now I want to use the trained model multiple times to test for a feature vector which is not a part of the training data and I do not have any idea on how to do it. I think that I may be able to do it by using scikit-learn. Please provide some help.
Upvotes: 0
Views: 3740
Reputation: 40159
Just slice the data, for instance for a classification problem:
>>> import numpy as np
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> data_train = np.loadtxt('data_train.csv', delimiter=',')
>>> X = data_train[:, 1:]
>>> y = data_train[:, 0].astype(np.int)
>>> clf = ExtraTreesClassifier(n_estimators=100).fit(X, y)
Then make prediction on the test data that does not have the target label in the first column:
>>> data_test = np.loadtxt('data_test.csv', delimiter=',')
>>> print(clf.predict(data_test))
Upvotes: 5