Steve
Steve

Reputation: 3068

why doesn't liblinear predict the majority class?

Most machine learning classifiers, when encountering an instance with no features that it's seen before, would classify the example with the class that was most frequent in the training data.

This doesn't seem to be the case with liblinear-java and I'm wondering why that is. Here's some sample code where I construct a sample problem where there are two features, and the training data has 4 times as many 0 labels as 1 labels:

Problem problem = new Problem();
problem.l = 5;
problem.n = 2;
problem.x = new FeatureNode[][] {
  new FeatureNode[] { new FeatureNode(1, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1) },  
  new FeatureNode[] { new FeatureNode(2, 1) },  
};
problem.y = new int[] {0, 0, 0, 0, 1};

Parameter parameter = new Parameter(SolverType.L2R_L2LOSS_SVC, 1.0, 0.01);
Model model = Linear.train(problem, parameter);

Now let's test this on a new feature, 3, which wasn't in the training data. Since the trained model knows nothing about feature 3, I would have expected that the predicted class would be 0, the most common class in the training data.

FeatureNode[] instance = new FeatureNode[] { new FeatureNode(3, 1) };
int prediction = Linear.predict(model, instance);
System.err.println(prediction);

That last line prints out 1 however. Why is that?

Upvotes: 1

Views: 707

Answers (1)

Steve
Steve

Reputation: 3068

I believe this is what the "-B" (bias) argument to the command line version of liblinear is intended to fix. That argument is not available if you create the FeatureNodes directly, but it's essentially the same as adding a new FeatureNode(1, 1) to the beginning of every FeatureNode[]. If I follow this approach, and add an additional bias feature both during training and during classification, everything works. Here's what that code looks like:

Problem problem = new Problem();
problem.l = 5;
problem.n = 3;
problem.x = new FeatureNode[][] {
  new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(2, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(2, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(2, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(2, 1) },  
  new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(3, 1) },  
};
problem.y = new int[] {0, 0, 0, 0, 1};

Parameter parameter = new Parameter(SolverType.L2R_L2LOSS_SVC, 1.0, 0.01);
Model model = Linear.train(problem, parameter);
FeatureNode[] instance = new FeatureNode[] { new FeatureNode(1, 1), new FeatureNode(4, 1) };
int prediction = Linear.predict(model, instance);

To figure out why the bias feature is necessary, I dug into the liblinear-java code a bit. Here's what the prediction code looks like:

for (int i = 0; i < nr_w; i++)
    dec_values[i] = 0;

for (FeatureNode lx : x) {
    int idx = lx.index;
    // the dimension of testing data may exceed that of training
    if (idx <= n) {
        for (int i = 0; i < nr_w; i++) {
            dec_values[i] += w[(idx - 1) * nr_w + i] * lx.value;
         }
    }
}

So, in the case where the features were never seen during training, we just get a dec_values (decision values) array of all zeros, which means that all classes have equal probability. So it's crucial that at least one feature seen during training is present in every instance seen during classification.

Adding a 'bias' feature with a constant value (e.g. 1) solves this problem, allowing the model to learn a default weight to apply to any new instances. In the code above, the model learns a weight of 0.0869565217391306 for the bias feature, meaning that the model correctly learned to favor the class 0 over the class 1.

Upvotes: 2

Related Questions