Abdul Rahman
Abdul Rahman

Reputation: 1384

Keras Embedding layer output dimensionality

I am confused with the output dimensions specified in the embedding layer in this code snippet

from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

max_features = 10000
maxlen = 500
batch_size = 32

print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)

print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')
print('Pad sequences (samples x time)')

input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)

print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)

print(input_train)

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

Since the max_features is 10000, shouldn't the Embedding have an output dimensionality of 10000?

Upvotes: 0

Views: 1745

Answers (2)

Statistic Dean
Statistic Dean

Reputation: 5270

The output dimensionality of the embedding is the dimension of the tensor you use to represent each word. In your case, you use a 32-dimensional tensor to represent each of the 10k word you might get in your dataset.

Upvotes: 1

Anna Krogager
Anna Krogager

Reputation: 3588

max_features is the number of words, not the dimensionality. In your embedding layer you have 10000 words that are each represented as an embedding with dimension 32.

Upvotes: 2

Related Questions