Reputation: 4235
Using Python 2.7 Anaconda on Windows 10
I have trained a GRU neural network to build a language model using keras:
print('Build model...')
model = Sequential()
model.add(GRU(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(GRU(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
How do I calculate the perplexity of this language model? For example, NLTK offers a perplexity calculation function for its models.
Upvotes: 2
Views: 5529
Reputation: 2128
I see that you have also followed the Keras tutorial on language model, which to my understanding is not entirely correct. This is due to the fact that the language model should be estimating the probability of every subsequence e.g., P(c_1,c_2..c_N)=P(c_1)P(c_2 | c_1)..P(c_N | c_N-1...c_1) However, assuming your input is a matrix with shape sequence_length X #characters and your target is the character following the sequence, the output of your model will only yield the last term P(c_N | c_N-1...c_1)
Following that the perplexity is P(c_1,c_2..c_N)^{-1/N}, you cannot get all of the terms. This is why I recommend using the TimeDistributedDense layer. It will give you a matrix of sequence_length X #characters, where every row is a probability distribution over the characters, call it proba
From every row of proba, you need the column that contains the prediction for the correct character:
correct_proba = proba[np.arange(maxlen),yTest],
assuming yTest is a vector containing the index of the correct character at every time step
Then the perplexity for a sequence ( and you have to average over all your training sequences is)
np.power(2,-np.sum(np.log(correct_proba),axis=1)/maxlen)
PS. I would have rather written the explanation in latex
Upvotes: 6