Reputation: 13
I'm trying to do a simple prediction in DL4j (going to use it later for a large dataset with n features) but no matter what I do my network just doesn't want to learn and behaves very weird. Of course I studied all the tutorials and did the same steps shown in dl4j repo, but it doesn't work for me somehow.
For dummy features data I use:
*double[val][x] features; where val = linspace(-10,10)...; and x= Math.sqrt(Math.abs(val)) * val;
my y is : double[y] labels; where y = Math.sin(val) / val
DataSetIterator dataset_train_iter = getTrainingData(x_features, y_outputs_train, batchSize, rnd);
DataSetIterator dataset_test_iter = getTrainingData(x_features_test, y_outputs_test, batchSize, rnd);
// Normalize data, including labels (fitLabel=true)
NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
normalizer.fitLabel(false);
normalizer.fit(dataset_train_iter);
normalizer.fit(dataset_test_iter);
// Use the .transform function only if you are working with a small dataset and no iterator
normalizer.transform(dataset_train_iter.next());
normalizer.transform(dataset_test_iter.next());
dataset_train_iter.setPreProcessor(normalizer);
dataset_test_iter.setPreProcessor(normalizer);
//DataSet setNormal = dataset.next();
//Create the network
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.weightInit(WeightInit.XAVIER)
//.miniBatch(true)
//.l2(1e-4)
//.activation(Activation.TANH)
.updater(new Nesterovs(0.1,0.3))
.list()
.layer(new DenseLayer.Builder().nIn(numInputs).nOut(20).activation(Activation.TANH)
.build())
.layer(new DenseLayer.Builder().nIn(20).nOut(10).activation(Activation.TANH)
.build())
.layer( new DenseLayer.Builder().nIn(10).nOut(6).activation(Activation.TANH)
.build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.activation(Activation.IDENTITY)
.nIn(6).nOut(1).build())
.build();
//Train and fit network
final MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
net.setListeners(new ScoreIterationListener(100));
//Train the network on the full data set, and evaluate in periodically
final INDArray[] networkPredictions = new INDArray[nEpochs / plotFrequency];
for (int i = 0; i < nEpochs; i++) {
//in fit we have already Backpropagation. See Release deeplearning
// https://deeplearning4j.konduit.ai/release-notes/1.0.0-beta3
net.fit(dataset_train_iter);
dataset_train_iter.reset();
if((i+1) % plotFrequency == 0) networkPredictions[i/ plotFrequency] = net.output(x_features, false);
}
// evaluate and plot
dataset_test_iter.reset();
dataset_train_iter.reset();
INDArray predicted = net.output(dataset_test_iter, false);
System.out.println("PREDICTED ARRAY " + predicted);
INDArray output_train = net.output(dataset_train_iter, false);
//Revert data back to original values for plotting
// normalizer.revertLabels(predicted);
normalizer.revertLabels(output_train);
normalizer.revertLabels(predicted);
PlotUtil.plot(om, y_outputs_train, networkPredictions);
My output seems then very weird (see picture below), even when I use miniBatch (1, 20,100 Samples/Batch) change number of epochs or add hidden nodes and hidden Layers (tryed to add 1000 Nodes and 5 Layers). The network either outputs very stochastic values or the one constant y. I just can't recognize, what is going wrong here. Why the network even doesn't approach the train function.
Another question: what doesn iter.reset() do exactly. Does the Iterator turn the pointer back to 0-Batch in the DataSetIterator?
Upvotes: 0
Views: 387
Reputation: 3205
A pretty common problem is people doing toy problems like this is dl4j's assumption of minibatches (which 99% of problems tend to be). You aren't actually doing minibatch learning (which actually defeats the point of actually using an iterator, which is meant to iterate through slices of a dataset, not an in memory small dataset) - a small recommendation is to just use the normal dataset api (which is what's returned from dataset.next())
Ensure you turn off the minibatch penalty dl4j assigns to all losses with: .minibatch(false) - you can see that configuration here: https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/NeuralNetConfiguration.java#L434
A unit test testing this behavior can be found here: https://github.com/eclipse/deeplearning4j/blob/b4047006ac8175df295c2f3c008e7601437ea4dc/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/gradientcheck/GradientCheckTests.java#L94
For posterity, here is the relevant configuration:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().miniBatch(false)
.dataType(DataType.DOUBLE)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new NoOp())
.list()
.layer(0,
new DenseLayer.Builder().nIn(4).nOut(3)
.dist(new NormalDistribution(0, 1))
.activation(Activation.TANH)
.build())
.layer(1, new OutputLayer.Builder(LossFunction.MCXENT)
.activation(Activation.SOFTMAX).nIn(3).nOut(3).build())
.build();
You'll notice 2 things: 1 is minibatch is false and 2 is the configuration for data type double. You are also welcome to try that for your problem. Dl4j to save memory tends to also assume float for the default data type.
This is a reasonable assumption when working on larger problems, but may not work well for toy problems.
For reference, you can find the application of the minibatch math here: https://github.com/eclipse/deeplearning4j/blob/fc735d30023981ebbb0fafa55ea9520ec44292e0/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/updater/BaseMultiLayerUpdater.java#L332
This affects the gradient updates.
The score penalty can be found in the output layer: https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/BaseOutputLayer.java#L84
Essentially, both of these automatically penalize the loss update for your dataset reflected in both the loss and the gradient updates.
Upvotes: 1