Reputation: 135
I'm using Keras library for sequence labeling. I'm already using pre-trained embeddings for my experiments, using a methodology like this (https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html)
MY CODE (EMBEDDINGS SAVED INTERNALLY):
self._model = Sequential(name='core_sequential')
self._model.add(Embedding(input_dim=weights.shape[0],
output_dim=weights.shape[1],
weights=[weights],
name="embeddings_layer",trainable=False))
self._model.add(Dropout(dropout_rate,name='dropout_layer_1'))
self._model.add(Bidirectional(LSTM(output_dim=300,
return_sequences=distributed,
activation="tanh",
name="lstm_layer"),name='birnn_layer'))
self._model.add(Dropout(dropout_rate,name='dropout_layer_2'))
self._model.add(TimeDistributed(Dense(output_dim=1,
activation='sigmoid',
name='dense_layer'),
name="timesteps_layer"))
self._model.compile(optimizer=Adam(lr=lr),
loss='binary_crossentropy',
metrics=['accuracy'])
This is working perfectly fine, we just have to feed a nd-array of (X,max_sequence_size) shape, which is actually X padded sequences of max_sequence_size time-steps (word indices).
Saving pre-trained embeddings internally on the model is totally scaling out model's size (450MB per model). If someone wants to use this architecture for multiple models on his own system, let's say 20 of them, he needs approx. 10GB to save all models! The bottle neck in this case is that each model has saved internally the word embedding weights, while they are always the same.
Trying to find a sufficient way to decrease model's size, I thought it would be better to load the actual feature vector (embeddings) externally. , which means to load a nd-array of (X,max_sequence_size, embeddings_size) shape, which is actually X padded sequences of max_sequence_size time-steps of the actual embeddings.
I can't find any discussion of this important issue. In Keras documentation, embeddings seems to be the only available choice in RNNs, keras community seems to underestimate this memory issue. I tried to figure out a solution.
SOLUTION (EMBEDDINGS LOADED EXTERNALLY):
self._model = Sequential(name='core_sequential')
self._model.add(InputLayer((None, 200)))
self._model.add(Dropout(dropout_rate,name='dropout_layer_1'))
self._model.add(Bidirectional(LSTM(output_dim=300,
return_sequences=distributed,
activation="tanh",
name="lstm_layer"),name='birnn_layer'))
self._model.add(Dropout(dropout_rate,name='dropout_layer_2'))
self._model.add(TimeDistributed(Dense(output_dim=1,
activation='sigmoid',
name='dense_layer'),
name="timesteps_layer"))
self._model.compile(optimizer=Adam(lr=lr),
loss='binary_crossentropy',
metrics=['accuracy'])
The above code works, but consider the following:
I suggest the following solution.
MUCH BETTER SOLUTION (EMBEDDINGS LOADED EXTERNALLY + MASKING):
self._model = Sequential(name='core_sequential')
self._model.add(Masking(mask_value=0., input_shape=(None, 200)))
self._model.add(Dropout(dropout_rate,name='dropout_layer_1'))
self._model.add(Bidirectional(LSTM(output_dim=300,
return_sequences=distributed,
activation="tanh",
name="lstm_layer"),name='birnn_layer'))
self._model.add(Dropout(dropout_rate,name='dropout_layer_2'))
self._model.add(TimeDistributed(Dense(output_dim=1,
activation='sigmoid',
name='dense_layer'),
name="timesteps_layer"))
self._model.compile(optimizer=Adam(lr=lr),
loss='binary_crossentropy',
metrics=['accuracy'])
Feel free to comment and criticize, you are more than welcome!
Upvotes: 1
Views: 1199
Reputation: 135
SOLUTION (EMBEDDINGS LOADED EXTERNALLY):
self._model = Sequential(name='core_sequential')
self._model.add(InputLayer((None, 200)))
self._model.add(Dropout(dropout_rate,name='dropout_layer_1'))
self._model.add(Bidirectional(LSTM(output_dim=300,
return_sequences=distributed,
activation="tanh",
name="lstm_layer"),name='birnn_layer'))
self._model.add(Dropout(dropout_rate,name='dropout_layer_2'))
self._model.add(TimeDistributed(Dense(output_dim=1,
activation='sigmoid',
name='dense_layer'),
name="timesteps_layer"))
self._model.compile(optimizer=Adam(lr=lr),
loss='binary_crossentropy',
metrics=['accuracy'])
The above code works, but consider the following:
I suggest the following solution.
MUCH BETTER SOLUTION (EMBEDDINGS LOADED EXTERNALLY + MASKING):
self._model = Sequential(name='core_sequential')
self._model.add(Masking(mask_value=0., input_shape=(None, 200)))
self._model.add(Dropout(dropout_rate,name='dropout_layer_1'))
self._model.add(Bidirectional(LSTM(output_dim=300,
return_sequences=distributed,
activation="tanh",
name="lstm_layer"),name='birnn_layer'))
self._model.add(Dropout(dropout_rate,name='dropout_layer_2'))
self._model.add(TimeDistributed(Dense(output_dim=1,
activation='sigmoid',
name='dense_layer'),
name="timesteps_layer"))
self._model.compile(optimizer=Adam(lr=lr),
loss='binary_crossentropy',
metrics=['accuracy'])
Feel free to comment and criticize, you are more than welcome!
Upvotes: 0