deeplearning4j - use Word2Vec for named entity recognition

Question

I am trying to replicate the paper NLP (almost) from scratch using deeplearning4j. I have done the following steps:

load the SENNA word vectors
write a iterator for the CoNLL'03 dataset: for each word, I form a word feature vector by concatenating the word vectors of its neighbour words (with window size = 5)
use the above dataset iterator to train a simple regression layer, for example:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .seed(seed).iterations(iterations)
    .learningRate(1e-8f)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .list(2)
    .layer(0, new DenseLayer.Builder()
        .nIn(wordVecLayers * windowSize).nOut(hiddenSize)
        .activation("relu")
        .weightInit(WeightInit.DISTRIBUTION)
        .dist(new UniformDistribution(-2.83 / Math.sqrt(hiddenSize), 2.83 / Math.sqrt(hiddenSize)))
        .biasInit(0.0f).build())
    .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
        .nIn(hiddenSize).nOut(types.size())
        .activation("softmax").weightInit(WeightInit.DISTRIBUTION)
        .dist(new UniformDistribution(-2.83 / Math.sqrt(hiddenSize), 2.83 / Math.sqrt(hiddenSize)))
        .biasInit(0.0f).build())
    .backprop(true).pretrain(false)
    .build();

I have tried many different configurations but none of them worked for me. The model keep predicting all words with the 'O'-tag. I would appreciate if you can point out what's wrong with my approach? And what steps I should do next? Thank you!

deeplearning4j - use Word2Vec for named entity recognition

Answers (1)

Related Questions