Reputation: 4083
I want to predict computer actions with dialogs data between human and computer. I have 1000 dialogs for the training. Each dialog has different number of turns.
In my reference thesis (https://arxiv.org/abs/1702.03274) explains the training like below. It uses basic LSTM.
In training, each dialog formed one minibatch, and updates were done on full rollouts (i.e., non-truncated back propagation through time).
Then, I have two questions.
I am not a native speaker of English and not expert of machine learning. Any helps will be appreciated. Thank you.
I added more details. User inputs are translated to features, and system actions are translated one-bit vectors. So this tasks is a task of multi-classification problem. In the thesis, this task is tackled with one LSTM model. Each dialog has different number of turns.
dialog 1
t1: hello ([1,0,1,0,]) -> hi ([0,0,1,0])
t2: how are you ([0,1,1,0,]) -> fine ([0,1,0,0])
dialog 2
t1: hey ([1,0,1,0,]) -> hi ([0,0,1,0])
...
dialog 1000
...
So this problem is to predict y
via x
dialog_list = [ {(x1,y1), (x2,y2)}, {(x1,y1)}, .. ] # length is 1000
Upvotes: 1
Views: 439
Reputation: 2189
Let me explain your quote. First let's make an assumption about the data. I assume that a dialog with 4 turns would mean person A says something, then B responds, then A, then B. You could then have the data formatted as follows:
Notice this dialog has duplications, we should do this so the "response", i.e. the second sentence, is linked to the previous. This way of formatting your data is useful for an Encoder/Decoder LSTM. First sequence goes into Encoder, and second into Decoder. Here, each pair is a data sample. So this dialog has 3 samples.
In training, each dialog formed one minibatch,
The previous dialog can be a batch of 3 samples. We can do this for all dialogs. So each dialog is a batch. With mini-batch training, a batch goes through the network, the forward pass, after which you immediately perform backpropagation (update your parameters). So yes, 1000 mini-batches.
and updates were done on full rollouts (i.e., non-truncated back propagation through time).
As I explained above, updates are done immediately after the forward pass of your batch. That means for one epoch (i.e. going through all your data once) there are 1000 updates.
When working with RNNs, if sequences are too long, we can break them up. We could split the sample [A speaks sequence 1. B speaks sequence 2.] into 3 samples: [A speaks sequence], [1. B speaks], [sequence 2.]. We would feed the first sample into the network, backpropagate, then feed the second sample to the network, backpropagate, then the third. However, we need to save the last hidden state of sample 1 to give to the beginning of sample 2, and save the last state of 2 to give to 3. This is known as truncated backpropagation through time (TBPTT). If you "fully unroll" before backpropagation, then you don't do TBPTT. So for each batch, you only update the network once, and not 3 times in my example.
Hope that helps.
Upvotes: 1