stevew
stevew

Reputation: 734

Using dilated convolution in Keras

In WaveNet, dilated convolution is used to increase receptive field of the layers above.

Dilated convolution

From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.

import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()

And the output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_4 (InputLayer)         [(None, 200, 2)]          0
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 200, 5)            55
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 200, 5)            130
_________________________________________________________________
dense_2 (Dense)              (None, 200, 1)            6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________

I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?

Upvotes: 5

Views: 9663

Answers (2)

jeffery_the_wind
jeffery_the_wind

Reputation: 18158

Here's an example of this dialtion with 1D Convolutional layers, output has size 14:

https://github.com/jwallbridge/translob/blob/master/python/LobFeatures.py

def lob_dilated(x):
  """
  TransLOB dilated 1-D convolution module
  """
  x = layers.Conv1D(14,kernel_size=2,strides=1,activation='relu',padding='causal')(x)   
  x = layers.Conv1D(14,kernel_size=2,dilation_rate=2,activation='relu',padding='causal')(x)
  x = layers.Conv1D(14,kernel_size=2,dilation_rate=4,activation='relu',padding='causal')(x)
  x = layers.Conv1D(14,kernel_size=2,dilation_rate=8,activation='relu',padding='causal')(x)
  y = layers.Conv1D(14,kernel_size=2,dilation_rate=16,activation='relu',padding='causal')(x)

  return y

Upvotes: 0

DMolony
DMolony

Reputation: 643

The model summary is as expected. As you note using dilated convolutions results in an increase in the receptive field. However, dilated convolution actually preserves the output shape of our input image/activation as we are just changing the convolutional kernel. A regular kernel could be the following

0 1 0
1 1 1
0 1 0

A kernel with a dilation rate of 2 would add zeros in between each entry in our original kernel as below.

0 0 1 0 0
0 0 0 0 0
1 0 1 0 1
0 0 0 0 0
0 0 1 0 0

In fact you can see that our original kernel is also a dilated kernel with a dilation rate of 1. Alternative ways to increase the receptive field result in a downsizing of the input image. Max pooling and strided convolution are 2 alternative methods.

For example. if you want to increase the receptive field by decreasing the size of your output shape you could use strided convolution as below. I replace the dilated convolution with a strided convolution. You will see that the output shape reduces every layer.

import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', strides=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', strides=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, 200, 2)]          0
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 100, 5)            55
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 25, 5)             130
_________________________________________________________________
dense_1 (Dense)              (None, 25, 1)             6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________

To summarize dilated convolution is just another way to increase the receptive field of your model. It has the benefit of preserving the output shape of your input image.

Upvotes: 6

Related Questions