Seq2Seq with Keras understanding

Question

For some self-studying, I'm trying to implement simple a sequence-to-sequence model using Keras. While I get the basic idea and there are several tutorials available online, I still struggle with some basic concepts when looking these tutorials:

Keras Tutorial: I've tried to adopt this tutorial. Unfortunately, it is for character sequences, but I'm aiming for word sequences. There's is a block to explain the required for word sequences, but this is currently throwing "wrong dimension" errors -- but that's OK, probably some data preparation errors from my side. But more importantly, in this tutorial, I can clearly see the 2 types of input and 1 type of output: encoder_input_data, decoder_input_data, decoder_target_data
MachineLearningMastery Tutorial: Here the network model looks very different, completely sequential with 1 input and 1 output. From what I can tell, here the decoder gets just the output of the encoder.

Is it correct to say that these are indeed two different approaches towards Seq2Seq? Which one is maybe better and why? Or do I read the 2nd tutorial wrongly? I already got an understanding in sequence classification and sequences labeling, but with sequence-to-sequence it hasn't properly clicked yet.

Littleone · Accepted Answer

Yes, those two are different approaches and there are other variations as well. MachineLearningMastery simplifies things a bit to make it accessible. I believe Keras method might perform better and is what you will need if you want to advance to seq2seq with attention which is almost always the case.

MachineLearningMastery has a hacky workaround that allows it to work without handing in decoder inputs. It simply repeats the last hidden state and passes that as the input at each timestep. This is not a flexible solution.

    model.add(RepeatVector(tar_timesteps))

On the other hand Keras tutorial has several other concepts like teacher forcing (using targets as inputs to the decoder), embeddings(lack of) and a lengthier inference process but it should set you up for attention.

I would also recommend pytorch tutorial which I feel is the most appropriate method.

Edit: I dont know your task but what you would want for word embedding is

x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)

Before that, you need to map every word in the vocabulary into an integer, turn every sentence into a sequence of integers and pass that sequence of integers to the model (embedding layer of latent_dim maybe 120). So each of your word is now represented by a vector of size 120. Also your input sentences must be all of the same size. So find an appropriate max sentence length and turn every sentence into that length and pad with zero if sentences are shorter than max len where 0 represents a null word perhaps.

Seq2Seq with Keras understanding

Answers (1)

Related Questions