Rahul
Rahul

Reputation: 93

Prediction in weka using explorer

Once i have trained and generated a model , as of now from the examples i have seen , we are using a testing set where we have to put values for actual and predicted , is there a way where i can either put this actual column as empty or cannot use it at all when am doing the prediction

if i take with an example , following is my training set

@relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value

and am using a testing set like

 @relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value

and output like

@relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value
@attribute predicted-value
@attribute predicted-margin

My Question is can i either remove value or keep it as empty from testing set

Upvotes: 0

Views: 239

Answers (1)

Rushdi Shams
Rushdi Shams

Reputation: 2423

Case 1: Both your training and test set have class labels

Training:

@relation
simple-training
@attribute
feature1 numeric
feature2 numeric
class string{a,b}
@data
1, 2, b
2, 4, a
.......

Testing:

@relation
simple-testing
@attribute
feature1 numeric
feature2 numeric
class string{a,b}
@data
7, 12, a
8, 14, a
.......

In this case, whether you are using k-fold cv or train-test setup, Weka will not take a look at your class labels in the test set. It gets its model from training, blindly apply that on test set and then compares its prediction with the actual class labels in your testing set.

This is useful if you want to see the performance evaluation of your classifier.

Case 2: You have class labels for training data but you don't have class labels for testing data.

Training:

@relation
    simple-training
    @attribute
    feature1 numeric
    feature2 numeric
    class string{a,b}
    @data
    1, 2, b
    2, 4, a
    .......

Testing:

 @relation
    simple-testing
    @attribute
    feature1 numeric
    feature2 numeric
    class string{a,b}
    @data
    7, 12, ?
    8, 14, ?
    .......

This is very normal since this is what we need to do- apply training model on unseen unlabeled data to label them! In that case simply put ? marks at your testing class labels. After running Weka on this setup you will get the output with these ? marks replaced by the predicted values (you don't need to create any additional column as this will give you error).

So, in a nutshell- you need to have compatibility in your training and testing data. In testing data if you don't know the value and you want to predict it, then put a ? mark in that column.

Upvotes: 1

Related Questions