Reputation: 353
I believe this is a unique problem, but definitely link me to other answers elsewhere if they exist. I have a convolutional sequential network in Keras, very similar to the one in the guide to the sequential model (and here is their model):
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalAveragePooling1D, MaxPooling1D
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(seq_length, 100)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)
Unfortunately, my sequence sizes are pretty massive (up to a million), and I would really like to do an embedding. With that, I'd like to do 2d Convolution (and possibly much deeper architectures). My GPU runs fast enough as convolution is easy, but it has 2GB of memory. Therefore, I cannot even train the network one sample at a time. As soon as I introduce an embedding, it will explode the size of the network - in this example, (batch_size, 1000000, 100, embed_size).
I know about fit_generator, but using fit_generator (and TimeSeriesGenerator) requires me to have a label for every step of the broken up timesteps of the sequence. My problem is a simple classification problem so it does not make sense to provide a label at, for example, after the first 1000 timesteps of the sequence compared to all million. My impression is that the network is probably running the GlobalAveragePooling for every part of the broken up sequence. As proof, when I run fit_generator compared to regular_fit on a small dataset, the performance for fit_generator suffers greatly.
Therefore my problem is: what can I use to create a large network to run on extremely long sequences in Keras? Is it possible I am misunderstanding fit_generator? Or is there some other way to break up long sequences into parts? If this absolutely does not exist, I can probably write it myself and submit it to Keras, but I would rather not.
This is NOT like an LSTM with extremely long sequence lengths because I do not care about TBTT, and convolutional networks do not have state.
Upvotes: 3
Views: 614
Reputation: 353
If anyone stumbles on this, to solve the issue I simply used a max pooling layer (not avg. pooling) of size 10 as the input. This effectively reduces the number of items in the sequence by a factor of 10, allowing space for the embedding layer.
It performs great, so I don't think reducing the input had an adverse affect. Using Max pooling essentially just chooses items at random, as the number values for each item are chosen at random.
Upvotes: 1
Reputation: 1268
You have a sequence of sentences, and the embedding can only be applied to one sentence at a time, so you need to wrap it in a TimeDistributed layer
from keras.models import Sequential
from keras.layers import Embedding, TimeDistributed
# plug-in your own values
vocab_size = 10000
embed_size = 200
seq_length = 1000000
model = Sequential()
model.add(TimeDistributed(Embedding(vocab_size, embed_size),
input_shape=(seq_length, 100)))
The above gives me an input_shape of (None, 1000000, 100)
and an output shape of (None, 1000000, 100, 200)
, with 2 million parameters.
Upvotes: 2