Ash
Ash

Reputation: 3550

Get the probability distribution of next word given a sequence using TensorFlow's RNN (LSTM) language model?

I'm running TensorFlow's RNN (LSTM) language model example here. It runs and reports the perplexities perfectly.

What I want though is three things:

  1. Given a sequence (e.g. w1 w5 w2000 w750) give me the probability distribution for the next word over the vocabulary. I don't know how to do it with the model in the tutorial.

  2. I want the model to return a ranking of the most probable sequences (e.g. n-grams), n can be given as input.

and

  1. Given a sequence, I want it's probability.

I'm new to TensorFlow and RNNs so plz tell me if you need more information than I have provided.

The code for the language model is here.

Upvotes: 1

Views: 2965

Answers (2)

Minura Punchihewa
Minura Punchihewa

Reputation: 2025

I know this may be coming a little late, but I will answer anyway. With TensorFlow 2, it is possible to obtain the probability distribution over the classes that make up the model with the use of the model.predict_proba() function. In the context of a language model, this will produce the probability distribution of what the next word in the sequence will be based on the vocabulary that you have used.

As for your second question, I do not know if it is possible. According to my understanding, this would mean that you would need to train your language model a little differently. I am assuming that previously, you used the last component of the sequence as your label, but in this case, you could use a n-gram sequence instead.

The last question that you have asked is a problem that I am currently facing as well. If you were able to find the answer to this problem, please let me know.

Upvotes: 2

Jie.Zhou
Jie.Zhou

Reputation: 1318

I'm new to tensorflow and RNN too, so here's my thinking about your questions.
Assuming you have a corpus consisting 2000 words (too small), the output of the i-th LSTM cell is a vector having 2000 elements each corresponding to a probability and this vector is the predicted probability distribution for the (i+1)th word.
Back to your question.

  1. You just need to feed the input [w1,w5,w2000,w750] to RNN, and you get four vectors each having 2000 elements (the number of words in corpus), and then you pick up the last output vector and that's the predicted probability distribution of the 5th word and you also can do an argmax on this vector to find the most probable word for 5th position.

  2. I have no idea about this question even I can assign a probability to any given sequences.

  3. Also considering your input [w1,w5,w2000,w750], after calculating RNN you have four output vectors denoted as [v1,v2,v3,v4], and then you just need to find the probabilities of w5 in v1, w2000 in v2, w750 in v3 and multiply these probabilities and that's the probability of your input (v4 is not used because it is used to predict the next word of this sequence, w1 is also not used because it is usually the starting token).

Edit:

Once you have trained your model, you should get a embedding matrix embedding, a RNN cell cell and a softmax weights/biases softmax_w / softmanx_b, you can generate outputs using these three things.

python

def inference(inputs):
    """
    inputs: a list containing a sequence word ids
    """
    outputs = []
    state = cell.zero_state(1,tf.float32) # 1 means only one sequence
    embed = tf.embedding_lookup(embedding,inputs)
    sequence_length = len(inputs)
    for i in range(sequence_length):
        cell_output,state = cell(embed[:,i,:],state)
        logits = tf.nn.xw_plus_b(cell_output,softmax_w,softmax_b)
        probability = tf.nn.softmax(logits)
        outputs.append(probability)
    return outputs

The final output is a list containg len(inputs) vectors / tensors, you can use sess.run(tensor) to get the value of a tensor in the form of numpy.array.
This is a just simple function I wrote and should give you a general idea about how to generate outputs when you finish training.

Upvotes: 2

Related Questions