Reputation: 1384
I am confused with the output dimensions specified in the embedding layer in this code snippet
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN
max_features = 10000
maxlen = 500
batch_size = 32
print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')
print('Pad sequences (samples x time)')
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)
print(input_train)
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
Since the max_features is 10000, shouldn't the Embedding have an output dimensionality of 10000?
Upvotes: 0
Views: 1745
Reputation: 5270
The output dimensionality of the embedding is the dimension of the tensor you use to represent each word. In your case, you use a 32-dimensional tensor to represent each of the 10k word you might get in your dataset.
Upvotes: 1
Reputation: 3588
max_features
is the number of words, not the dimensionality. In your embedding layer you have 10000 words that are each represented as an embedding with dimension 32.
Upvotes: 2