Twisted Tea
Twisted Tea

Reputation: 389

LSTM in DL4J - All output values are the same

I'm trying to create a simple LSTM using DeepLearning4J, with 2 input features and a timeseries length of 1. I'm having a strange issue however; after training the network, inputting test data yields the same, arbitrary result regardless of the input values. My code is shown below.

(UPDATED)

public class LSTMRegression {
    public static final int inputSize = 2,
                            lstmLayerSize = 4,
                            outputSize = 1;
    
    public static final double learningRate = 0.0001;

    public static void main(String[] args) {
        int miniBatchSize = 99;
        
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .miniBatch(false)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(new Adam(learningRate))
                .list()
                .layer(0, new LSTM.Builder().nIn(inputSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.TANH).build())
//                .layer(1, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
//                        .weightInit(WeightInit.XAVIER)
//                        .activation(Activation.SIGMOID).build())
//                .layer(2, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
//                        .weightInit(WeightInit.XAVIER)
//                        .activation(Activation.SIGMOID).build())
                .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.IDENTITY)
                        .nIn(lstmLayerSize).nOut(outputSize).build())
                
                .backpropType(BackpropType.TruncatedBPTT)
                .tBPTTForwardLength(miniBatchSize)
                .tBPTTBackwardLength(miniBatchSize)
                .build();
        
        final var network = new MultiLayerNetwork(conf);
        final DataSet train = getTrain();
        final INDArray test = getTest();
        
        final DataNormalization normalizer = new NormalizerMinMaxScaler(0, 1);
//                                          = new NormalizerStandardize();
        
        normalizer.fitLabel(true);
        normalizer.fit(train);

        normalizer.transform(train);
        normalizer.transform(test);
        
        network.init();
        
        for (int i = 0; i < 100; i++)
            network.fit(train);
        
        final INDArray output = network.output(test);
        
        normalizer.revertLabels(output);
        
        System.out.println(output);
    }
    
    public static INDArray getTest() {
        double[][][] test = new double[][][]{
            {{20}, {203}},
            {{16}, {183}},
            {{20}, {190}},
            {{18.6}, {193}},
            {{18.9}, {184}},
            {{17.2}, {199}},
            {{20}, {190}},
            {{17}, {181}},
            {{19}, {197}},
            {{16.5}, {198}},
            ...
        };
        
        INDArray input = Nd4j.create(test);
        
        return input;
    }
    
    public static DataSet getTrain() {
        double[][][] inputArray = {
            {{18.7}, {181}},
            {{17.4}, {186}},
            {{18}, {195}},
            {{19.3}, {193}},
            {{20.6}, {190}},
            {{17.8}, {181}},
            {{19.6}, {195}},
            {{18.1}, {193}},
            {{20.2}, {190}},
            {{17.1}, {186}},
            ...
        };
        
        double[][] outputArray = {
                {3750},
                {3800},
                {3250},
                {3450},
                {3650},
                {3625},
                {4675},
                {3475},
                {4250},
                {3300},
                ...
        };
        
        INDArray input = Nd4j.create(inputArray);
        INDArray labels = Nd4j.create(outputArray);
        
        return new DataSet(input, labels);
    }
}

Here's an example of the output:

(UPDATED)

00:06:04.554 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.554 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]

[[[3198.1614]], 

 [[2986.7781]], 

 [[3059.7017]], 

 [[3105.3828]], 

 [[2994.0127]], 

 [[3191.4468]], 

 [[3059.7017]], 

 [[2962.4341]], 

 [[3147.4412]], 

 [[3183.5991]]]

So far I've tried tried changing a number of hyperparameters, including the updater (previously Adam), the activation function in the hidden layers (previously ReLU), and the learning rate; none of which fixed the issue.

Thank you.

Upvotes: 1

Views: 524

Answers (1)

Adam Gibson
Adam Gibson

Reputation: 3205

This is always either a tuning issue or input data. In your case your input data is wrong.

You almost always need need to normalize your input data or your network won't learn anything. This is also true for your outputs. Your output labels should also be normalized.
Snippets below:

 //Normalize data, including labels (fitLabel=true)
        NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
        normalizer.fitLabel(true);
        normalizer.fit(trainData);              //Collect training data statistics

        normalizer.transform(trainData);
        normalizer.transform(testData);

Here's how to revert:


        //Revert data back to original values for plotting
        normalizer.revert(trainData);
        normalizer.revert(testData);
        normalizer.revertLabels(predicted);

There are different kinds of normalizers, the below just does 0 to 1. Sometimes NormalizeStandardize could be better here. That will normalize the data by subtracting the mean and dividing by the variance in the data. That will be something like this:

       NormalizerStandardize myNormalizer = new NormalizerStandardize();
        myNormalizer.fitLabel(true);
        myNormalizer.fit(sampleDataSet);

Afterwards your network should train normally.

Edit: If that doesn't work,due to the size of your dataset dl4j also has a knob (I explained this in my comment below) that normally is true where we assume your data is minibatch. On most reasonable problems (read: not 10 data points) this works. Otherwise the training can be all over the place. We can turn off the minibatch assumption with:

  ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
                .miniBatch(false)

The same is true for multilayer network as well.

Also of note is your architecture is vastly overkill for what is a VERY small unrealistic problem for DL. DL usually requires a lot more data to work properly. That is why you see layers stacked multiple times. For a problem like this I would suggest reducing the number of layers to 1.

At each layer what's essentially happening is a form of compression of information. When your number of data points is small, you eventually lose signal through the network when you've saturated it. Subsequent layers tend to not learn very well in that case.

Upvotes: 2

Related Questions