thomas
thomas

Reputation: 634

OpenCV MLP with Sigmoid Neurons, Output range

I have searched for answers here on SO and google to the following question, but haven't found anything, so here is my situation:

I want to realize a MLP that learns some similarity function. I have training and test samples and the MLP set up and running. My problem is how to provide the teacher outputs to the net (from which value range).

Here is is the relevant part of my code:

CvANN_MLP_TrainParams params(
    cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
    CvANN_MLP_TrainParams::BACKPROP,
    0.1,
    0.1);

Mat layers = (Mat_<int>(3,1) << FEAT_SIZE, H_NEURONS, 1);

CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);

int iter = net.train(X, Y, Mat(), Mat(), params);

net.predict(X_test, predictions);

The number of input and hidden neurons is set somewhere else and the net has 1 output neuron. X, Y, X_test are Mats containing the training and test samples, no problem here. The problem is, from what value range my Y's have to come and from what value range the predictions will come.

In the documentation I have found the following statements:

For training:

If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.

Since I'm NOT using the default sigmoid function (the one with alpha=0 and beta=0), I'm providing my Y's from [0,1]. Is this right, or do they mean something else with 'default sigmoid function'? Im asking this, because for prediction they explicitly mention alpha and beta:

If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].

Again, since I'm not using the default sigmoid function, I assume to get predictions from [0,1]. Am I right so far?

What is confusing me here is that I've found another question regarding the output range of OpenCV's sigmoid function, that says the range has to be [-1,1].

And now comes the real confusion: When I train the net and let it make some predictions, I get values slightly larger than 1 (around 1.03), regardless if my Y's come from [0,1] or [-1,1]. And this shouldn't happen in either case.

Could somebody please enlighten me? Am I missing something here?

Thanks in advance.

EDIT:

To make things very clear, I came up with a small example that shows the problem:

#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>

using namespace cv;
using namespace std;

int main() {

    int POS = 1;
    int NEG = -1;

    int SAMPLES = 100;
    float SPLIT = 0.8;

    float C_X = 0.5;
    float C_Y = 0.5;
    float R = 0.3;

    Mat X(SAMPLES, 2, CV_32FC1);
    Mat Y(SAMPLES, 1, CV_32FC1);

    randu(X, 0, 1);

    for(int i = 0; i < SAMPLES; i++){
        Y.at<float>(i,0) = pow((X.at<float>(i,0) - C_X),2) + pow((X.at<float>(i,1) - C_Y),2) < pow(R,2) ? POS : NEG;
    }

    Mat X_train = X(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
    Mat Y_train = Y(Range(0, (int)(SAMPLES*SPLIT)), Range::all());

    Mat X_test = X(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
    Mat Y_test = Y(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());

    CvANN_MLP_TrainParams params(
                 cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
                 CvANN_MLP_TrainParams::BACKPROP,
                 0.1,
                 0.1);

    Mat layers = (Mat_<int>(3,1) << 2, 4, 1);

    CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
    net.train(X_train, Y_train, Mat(), Mat(), params);

    Mat predictions(Y_test.size(), CV_32F); 
    net.predict(X_test, predictions);

    cout << predictions << endl;

    Mat error = predictions-Y_test;
    multiply(error, error, error);

    float mse = sum(error)[0]/error.rows;

    cout << "MSE: " << mse << endl;

    return 0;
}

This code generates a set of random points from a unit square and assignes the labels POS or NEG to them, depending oh whether they are inside the circle given by C_X, C_Y and R. Then a test and a training set are generated and the MLP is trained. Now we have two situations:

  1. POS = 1, NEG = -1:

Output is provided to the net as it should be for tanh neurons (from [-1,1]), and I expect predictions from that range. But I also get predictions like -1.018 or 1.052. The mean squared error in this case was 0.13071 for me.

  1. POS = 1, NEG = 0:

The output is given like it is said to be optimal (at least I understand the documentation that way). And since I'm not using the default sigmoid function I expect predictions from [0,1]. But I also get values like 1.0263158 and even negative ones. The MSE in this case gets better with 0.0326775.

I know, this example is a classification problem and normally I would just round the values to the closest label, but I want to learn a similarity function and have to rely on the predictions to come from some fixed range.

Upvotes: 4

Views: 2800

Answers (2)

Hyunjun Kim
Hyunjun Kim

Reputation: 124

My answer is late, so I write this for other people with the same question.

If you see the setActivationFunction() and calc_activ_func() in ann_mlp.cpp the sigmoid returns value within [-1.7159, 1.7159] output when you set fparam1, fparam2 to 0, 0. You can change the slope and range by adjusting the fparam1, fparam2.

The functions is called symmetric sigmoid,but it actually compute tanh. If you want the real sigmoid function, I think you need to implement it.

Upvotes: 1

Matthew Spencer
Matthew Spencer

Reputation: 2295

It really comes down to the Activation Function that is being applied for your MLP.

There are a number of different Activation Functions that can be applied that squashes down the value of an Artificial Neuron to a defined range (The most common I a familiar with are the Hyperbolic Tangent and Logistic Function, but many others exist). Perhaps the one that you are using for your Neurons are scaled to go outside the 0 to 1 range.

As for the comment above for generating optimal results, it encourages that the data is formatted over the full range of outputs for the function, so that the MLP can learn over the whole range rather than a subset of it, which may reduce its learning capacity.

Upvotes: 0

Related Questions