Reputation: 1419
I am having input sequences with the following shape.
shape(1434, 185, 37)
There are total 1434 sequences, each with the length of 185 characters and the total number of unique characters is 37. So in a way, we have the vocab size as follows.
vocab_size=37
Now when I define my keras input to an embedded layer as follows,
user_input = keras.layers.Input(shape=((185,37)), name='Input_1')
user_vec = keras.layers.Flatten()(keras.layers.Embedding(vocab_size, 50, input_length=185, name='Input_1_embed')(user_input))
I get the following error.
Error:
ValueError: "input_length" is 185, but received input has shape (None, 185, 37)
Now when I do the following, I don't get any error but I have doubt if it is right or not.
user_input = keras.layers.Input(shape=((185, )), name='Input_1')
user_vec = keras.layers.Flatten()(keras.layers.Embedding(vocab_size, 50, input_length=185, name='Input_1_embed')(user_input))
Upvotes: 0
Views: 106
Reputation: 33410
As mentioned in the comments section, embedding layer takes integer values as input, not one-hot encoded vectors. That's why your second solution works but not the first one. See this answer for more explanation.
However, if each timestep in your sequences is a vector of integers representing word indices (for example in each document you have 185 sentences where each sentence has 37 words), then you need to use TimeDistributed
wrapper to apply the Embedding
layer to each timestep:
user_input = keras.layers.Input(shape=((185,37)), name='Input_1')
emb_layer = keras.layers.Embedding(vocab_size, 50, input_length=37, name='Input_1_embed')
user_vec = keras.layers.TimeDistributed(emb_layer)(user_input)
The shape of user_vec
would be (None, 185, 37, 50)
, i.e. an embedding vector of size 50 for each word in each timestep of each sequence.
Upvotes: 1