Sethuraman
Sethuraman

Reputation: 95

Regression: Test data for prediction requires class value? in Weka

M5P tree model from weka.classifiers: python-weka-wrapper Each row in my arff file consists of 6 attributes with 6th attribute being the target variable for which the model is being trained. I am using weka.core.converters.ArffLoader to the arff file to train. After the training, if i want to make predictions with some test data, I am creating instances and passing it to the built model to predict. In the instances i am passing only the values of the 5 attributes and not the target variable's value. I am getting a java exception:

Traceback (most recent call last): File "C:/Users/Sethuraman/PycharmProjects/Test_printer/m_M5P.py", line 85, in pred_dict1[index + 1] = cls.classify_instance(instance) File "C:\Users\Sethuraman\Anaconda2\lib\site-packages\python_weka_wrapper-0.3.8-py2.7.egg\weka\classifiers.py", line 105, in classify_instance return self.__classify(inst.jobject) File "C:\Users\Sethuraman\Anaconda2\lib\site-packages\javabridge-1.0.14-py2.7-win-amd64.egg\javabridge\jutil.py", line 852, in fn raise JavaException(x) javabridge.jutil.JavaException: Src and Dest differ in # of attributes: 5 != 6

why should I provide the target variable value? Is it necessary to pass the target value also? Essentially after the training the model should predict the target value. If yes, why? If no, how to deal with it? Please help!

Upvotes: 0

Views: 567

Answers (2)

fracpete
fracpete

Reputation: 2608

You can use the Add filter to introduce a new attribute. By default, this filter will mark all values of the new attribute as missing ("?"). Just make sure that the name of this new attribute and, in case of nominal class, the order of class labels is exactly the same as in the training data.

Upvotes: 1

Will Molter
Will Molter

Reputation: 530

If you want validation, you should definitely provide target values; how does the algorithm know how well it's done otherwise? But if you just want it to predict on that set, it seems the best way is to fill the target spot with '?', so that the data still has the 6 attributes, with the target simply marked as unknown. See http://weka.wikispaces.com/Making+predictions for more.

Upvotes: 0

Related Questions