Grigory Sharkov
Grigory Sharkov

Reputation: 140

Bidirectional GRU with 2x2 inputs

I am builduing a network, that splits strings into words, words into characters, embeds each character and then computes a vector represenation of this string by aggregating characters into words and words into string. Aggregation is performed with bidirectional gru layer with attention.
To test this thing, let's say I am interested in 5 words and 5 characters in this string. In this case my transformation is:

["Some string"] -> ["Some","strin","","",""] -> 
["Some_","string","_____","_____","_____"] where _ is the padding symbol ) -> 
[[1,2,3,4,0],[1,5,6,7,8],[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]] (shape 5x5)

Next I have an embedding layer that turns every character into an embedding vector of length, let's say 6. So my feature becomes a 5x5x6 matrix. Then I pass this output to bidirectional gru layer and perform some other manipulations that are not important in this case, I believe.

The problem is that when I run it with an iterator, like

for string in strings:
    output = model(string)

it seems to be working just fine (strings is a tf Dataset created from slices of 5x5), so it is a bunch of 5 by 5 matrices.

However when I pass over to training, or working at the dataset level with functions like predict, the model fails:

model.predict(strings.batch(1))
ValueError: Input 0 of layer bidirectional is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 5, 5, 6)

As far as I understand from documentation, the bidirectional layer takes 3d tensor as an input: [batch, timesteps, feature], so in this case my input shape should look like: [batch_size,timesteps,(5,5,6)]

So the question is which transformation should I apply to the input data to get this kind of shape?

Upvotes: 0

Views: 1285

Answers (1)

user11530462
user11530462

Reputation:

For the Bidirectional input layer if you are using GRU, use return_sequences=True, to get 3-Dimension output. Since GRU output is 2D, return_sequences will give you 3D output. For stacked Bidirectional layer input should be of shape 3D.

Sample code

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential()

model.add(
    layers.Bidirectional(layers.GRU(64, return_sequences=True), input_shape=(5, 10))
)
model.add(layers.Bidirectional(layers.GRU(32)))
model.add(layers.Dense(10))

model.summary()

Output

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional_3 (Bidirection (None, 5, 128)            38400     
_________________________________________________________________
bidirectional_4 (Bidirection (None, 64)                41216     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 80,266
Trainable params: 80,266
Non-trainable params: 0
___________________________

Upvotes: 1

Related Questions