Reputation: 525
I have a model trained with multiple LayerNormalization
layers, and I am unsure if a simple weight transfer works properly when activating dropout for prediction. This is the code I am using:
from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Dense, Dropout, LayerNormalization, Input
model0 = load_model(path + 'model0.h5')
OW = model0.get_weights()
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=True)
N1 = LayerNormalization()(DO1)
D2 = Dense(460,activation='softsign')(N1)
DO2 = Dropout(0.16)(D2,training=True)
N2 = LayerNormalization()(DO2)
D3 = Dense(664,activation='softsign')(N2)
DO3 = Dropout(0.09)(D3,training=True)
N3 = LayerNormalization()(DO3)
out = Dense(1,activation='linear')(N3)
mP = Model(inp,out)
mP.set_weights(OW)
mP.compile(loss='mse',optimizer='Adam')
mP.save(path + 'new_model.h5')
If I set training=False
on the dropout layers, the model makes identical predictions to the original model. However, when the code is written as above the mean prediction is not close to the original/deterministic prediction.
Previous models that I had developed with dropout set to training had mean probabilistic predictions nearly identical to the deterministic model. Is there something I am doing incorrectly, or is this an issue with using LayerNormalization and active dropout? As far as I know, LayerNormalization has trainable parameters, so i didn't know if active dropout interferes with that. If it does, I am not sure how to remedy this.
This segment of code is for running a quick test and plotting the results:
inputs = np.zeros(shape=(1,10),dtype='float32')
inputsP = np.zeros(shape=(1000,10),dtype='float32')
opD = mD.predict(inputs)[0,0]
opP = mP.predict(inputsP).reshape(1000)
print('Deterministic: %.4f Probabilistic: %.4f' % (opD,np.mean(opP)))
plt.scatter(0,opD,color='black',label='Det',zorder=3)
plt.scatter(0,np.mean(opP),color='red',label='Mean prob',zorder=2)
plt.errorbar(0,np.mean(opP),yerr=np.std(opP),color='red',zorder=2,markersize=0, capsize=20,label=r'$\sigma$ bounds')
plt.grid(axis='y',zorder=0)
plt.legend()
plt.tick_params(axis='x',labelsize=0,labelcolor='white',color='white',width=0,length=0)
And the resulting output and plot are shown below.
Deterministic: -0.9732 Probabilistic: -0.9011
Upvotes: 3
Views: 715
Reputation: 1177
Edit to my answer:
I think the problem is just an under-sampling from the model. The standard deviation of the predictions is directly tied to the dropout rate and thus the number of predictions you need to approximate the determistic model goes up as well. If you do an absurd test of the code below but with dropout set to 0.7 for each dropout layer, 100,000 samples is no longer enough to approximate the deterministic mean to within 10^-3 and the standard deviation of the predictions gets much larger.
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Input
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
GPUs = tf.config.experimental.list_physical_devices('GPU')
for gpu in GPUs:
tf.config.experimental.set_memory_growth(gpu, True)
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
D2 = Dense(460, activation='softsign')(D1)
D3 = Dense(664, activation='softsign')(D2)
out = Dense(1, activation='linear')(D3)
mP = Model(inp, out)
mP.compile(loss='mse', optimizer='Adam')
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=False)
D2 = Dense(460, activation='softsign')(DO1)
DO2 = Dropout(0.16)(D2,training=True)
D3 = Dense(664, activation='softsign')(DO2)
DO3 = Dropout(0.09)(D3,training=True)
out = Dense(1, activation='linear')(DO3)
mP2 = Model(inp, out)
mP2.set_weights(mP.get_weights())
mP2.compile(loss='mse', optimizer='Adam')
data = np.zeros(shape=(100000, 10),dtype='float32')
res = mP.predict(data).reshape(data.shape[0])
res2 = mP2.predict(data).reshape(data.shape[0])
print (np.abs(res[0] - res2.mean()))
Upvotes: 1