Reputation: 11
I have checked several questions in Stakoverflow and tutorials about Keras and TensorFlow embedding but I have found no answer that works for me. I explain.
I have 200.000 words dictionary. With 10376 unique "words". They represent Cellular device ID. IMEI. In this particular instance, I want to process them using Keras Functional API and then merge with the numerical data eventually when I solve this.
But I can pass the first level which part which is the embedding.
Here the code
#example of device
0 jg4M/taYRc2cBJPGa8c8vw==
1 jg4M/taYRc2cBJPGa8c8vw==
2 jg4M/taYRc2cBJPGa8c8vw==
3 chIM3a44QxatbmmjyBFGDQ==
4 PdhyfpkIT8Weslb54thwuQ==
5 lrDcRnK7RtKkvaqaYjliBQ==
#lenght of the device
device_len = len(device)
device_len
200000
#uniques device inside the 200000
top_words = len(np.unique(device))
top_words
10376
#keras encoded
encoded_docs = [one_hot(d, top_words) for d in device]
#max length of the vector for each word
max_length = 2
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_doc)
[[10269 9475]
[10269 9475]
[10269 9475]
...
[ 1340 2630]
[ 7270 0]
[ 2364 9298]]
#converted to tensors
padded_docs = tf.convert_to_tensor(padded_docs)
sess = tf.InteractiveSession()
print(padded_docs.eval())
sess.close()
#here is the networks
top_words = 10376
embedding_vector_length = 2
x = Embedding(top_words, embedding_vector_length)(padded_docs)
x = Dense(2, activation='sigmoid')(x)
modelx = Model(inputs=padded_docs, outputs = x)
ValueError: Input tensors to a Model must come from `keras.layers.Input`. Received: Tensor("Const:0", shape=(200000, 2), dtype=int32) (missing previous layer metadata).
I check similar questions and answers but I can't find something that works for me.
If someone can help me will be greatly appreciated
Thank you very much indeed.
Upvotes: 0
Views: 472
Reputation: 86600
You need an Input
for your model. padded_docs
is not a tensor, it's "data".
from keras.layers import Input
inputs = Input((doc_length,))
x = Embedding(top_words, embedding_vector_length)(inputs)
x = Dense(2, activation='sigmoid')(x)
modelx = Model(inputs=inputs, outputs = x)
Also, you need that padded_docs
be made of "integers", not of one-hot encodings. The Embedding
layer needs integers.
It's important to notice that you will not pass it as a tensor, but as a regular numpy array, to train with model.fit
.
So you need to remove the one_hot
and convert_to_tensor
parts.
Then you will do a model.fit(padded_docs, whatever_outputs, .....etc....)
Upvotes: 1
Reputation: 11132
When creating a Model, the input should be an Input layer, not a tensor.
input = keras.layers.Input((max_length,))
x = Embedding(top_words, embedding_vector_length)(input)
x = Dense(2, activation='sigmoid')(x)
modelx = Model(inputs=input, outputs=x)
Upvotes: 0