Alexander Pozharskii
Alexander Pozharskii

Reputation: 313

Deeplearning4j LSTM output size

I my case - at input I have List<List<Float>> (list of word representation vectors). And - have one Double at output from one sequence.

So I building next structure (first index - example number, second - sentence item number, third - word vector element number) : http://pastebin.com/KGdjwnki

And in output : http://pastebin.com/fY8zrxEL

But when I masting one of next (http://pastebin.com/wvFFC4Hw) to model.output - I getting vector [0.25, 0.24, 0.25, 0.25], not one value.

What can be wrong? Attached code (at Kotlin). classCount is one.

import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.conf.NeuralNetConfiguration.Builder
import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.deeplearning4j.nn.conf.Updater
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.nn.conf.layers.GravesLSTM
import org.deeplearning4j.nn.conf.layers.RnnOutputLayer
import org.deeplearning4j.nn.conf.BackpropType
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.indexing.NDArrayIndex
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.lossfunctions.LossFunctions
import java.util.*

class ClassifierNetwork(wordVectorSize: Int, classCount: Int) {
    data class Dimension(val x: Array<Int>, val y: Array<Int>)
    val model: MultiLayerNetwork
    val optimization = OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT
    val iterations = 1
    val learningRate = 0.1
    val rmsDecay = 0.95
    val seed = 12345
    val l2 = 0.001
    val weightInit = WeightInit.XAVIER
    val updater = Updater.RMSPROP
    val backtropType = BackpropType.TruncatedBPTT
    val tbpttLength = 50
    val epochs = 50
    var dimensions = Dimension(intArrayOf(0).toTypedArray(), intArrayOf(0).toTypedArray())

    init {
        val baseConfiguration = Builder().optimizationAlgo(optimization)
                .iterations(iterations).learningRate(learningRate).rmsDecay(rmsDecay).seed(seed).regularization(true).l2(l2)
                .weightInit(weightInit).updater(updater)
                .list()
        baseConfiguration.layer(0, GravesLSTM.Builder().nIn(wordVectorSize).nOut(64).activation("tanh").build())
        baseConfiguration.layer(1, GravesLSTM.Builder().nIn(64).nOut(32).activation("tanh").build())
        baseConfiguration.layer(2, GravesLSTM.Builder().nIn(32).nOut(16).activation("tanh").build())
        baseConfiguration.layer(3, RnnOutputLayer.Builder().lossFunction(LossFunctions.LossFunction.MCXENT)
                .activation("softmax").weightInit(WeightInit.XAVIER).nIn(16).nOut(classCount).build())
        val cfg = baseConfiguration.build()!!
        cfg.backpropType = backtropType
        cfg.tbpttBackLength = tbpttLength
        cfg.tbpttFwdLength = tbpttLength
        cfg.isPretrain = false
        cfg.isBackprop = true
        model = MultiLayerNetwork(cfg)
    }

    private fun dataDimensions(x: List<List<Array<Double>>>, y: List<Array<Double>>): Dimension {
        assert(x.size == y.size)
        val exampleCount = x.size
        assert(x.size > 0)
        val sentenceLength = x[0].size
        assert(sentenceLength > 0)
        val wordVectorLength = x[0][0].size
        assert(wordVectorLength > 0)
        val classCount = y[0].size
        assert(classCount > 0)
        return Dimension(
                intArrayOf(exampleCount, wordVectorLength, sentenceLength).toTypedArray(),
                intArrayOf(exampleCount, classCount).toTypedArray()
        )
    }

    data class Fits(val x: INDArray, val y: INDArray)
    private fun fitConversion(x: List<List<Array<Double>>>, y: List<Array<Double>>): Fits {
        val dim = dataDimensions(x, y)
        val xItems = ArrayList<INDArray>()
        for (i in 0..dim.x[0]-1) {
            val itemList = ArrayList<DoubleArray>();
            for (j in 0..dim.x[1]-1) {
                var rowList = ArrayList<Double>()
                for (k in 0..dim.x[2]-1) {
                    rowList.add(x[i][k][j])
                }
                itemList.add(rowList.toTypedArray().toDoubleArray())
            }
            xItems.add(Nd4j.create(itemList.toTypedArray()))
        }
        val xFits = Nd4j.create(xItems, dim.x.toIntArray(), 'c')
        val yItems = ArrayList<DoubleArray>();
        for (i in 0..y.size-1) {
            yItems.add(y[i].toDoubleArray())
        }
        val yFits = Nd4j.create(yItems.toTypedArray())
        return Fits(xFits, yFits)
    }

    private fun error(epoch: Int, x: List<List<Array<Double>>>, y: List<Array<Double>>) {
        var totalDiff = 0.0
        for (i in 0..x.size-1) {
            val source = x[i]
            val result = y[i]
            val realResult = predict(source)
            var diff = 0.0
            for (j in 0..result.size-1) {
                val elementDiff = result[j] - realResult[j]
                diff += Math.pow(elementDiff, 2.0)
            }
            diff = Math.sqrt(diff)
            totalDiff += Math.pow(diff, 2.0)
        }
        totalDiff = Math.sqrt(totalDiff)
        print("Epoch ")
        print(epoch)
        print(", diff ")
        println(totalDiff)
    }

    fun train(x: List<List<Array<Double>>>, y: List<Array<Double>>) {
        dimensions = dataDimensions(x, y)
        val(xFit, yFit) = fitConversion(x, y)
        for (i in 0..epochs-1) {
            model.input = xFit
            model.labels = yFit
            model.fit()
            error(i+1, x, y)
        }
    }

    fun predict(x: List<Array<Double>>): Array<Double> {
        val xList = ArrayList<DoubleArray>();
        for (i in 0..dimensions.x[1]-1) {
            var row = ArrayList<Double>()
            for (j in 0..dimensions.x[2]-1) {
                row.add(x[j][i])
            }
            xList.add(row.toDoubleArray())
        }
        val xItem = Nd4j.create(xList.toTypedArray())
        val y = model.output(xItem)
        val result = ArrayList<Double>()
        return result.toTypedArray()
    }
}

upd. Seems like next example have "near" task, so later I'll check it and post solution : https://github.com/deeplearning4j/dl4j-0.4-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java

Upvotes: 2

Views: 1107

Answers (3)

tired and bored dev
tired and bored dev

Reputation: 693

Not sure if I understand your requirements correctly, but if you want single output (that is predict a number or regression), you usually go with Identity activation, and MSE loss function. You've used softmax, which is usually used in classificatoin.

Upvotes: 0

Marcus V.
Marcus V.

Reputation: 6859

next to the recommendation to post this in the very active gitter and the hint of Adam to check out the great documentation, which explains how to set up the in- and output being of rank 3, I want to point out a few other things in your code, as I was struggling with similar problems:

  • check out the basic example here in examples/recurrent/basic/BasicRNNExample.java, here you see that for RNN you don't use model.output(xItem), but model.rnnTimeStep(xItem);
  • with class count of one you seem to be performing a regression, for that also check out the regression examples at examples/feedforward/regression/RegressionSum.java and documenation here, here you see that as an activiation function you should use "identity". "softmax" actually normalizes the output to sum up to one (see in glossary), so if you have just one output it will always output 1 (at least it did for my problem).

Upvotes: 1

Adam Gibson
Adam Gibson

Reputation: 3205

LSTM input/output can only be rank 3: see: http://deeplearning4j.org/usingrnns

Upvotes: 1

Related Questions