WVJoe
WVJoe

Reputation: 525

Should my model with Monte Carlo dropout provide a mean prediction similar to the deterministic prediction?

I have a model trained with multiple LayerNormalization layers, and I am unsure if a simple weight transfer works properly when activating dropout for prediction. This is the code I am using:

from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Dense, Dropout, LayerNormalization, Input

model0 = load_model(path + 'model0.h5')
OW = model0.get_weights()

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=True)
N1 = LayerNormalization()(DO1)
D2 = Dense(460,activation='softsign')(N1)
DO2 = Dropout(0.16)(D2,training=True)
N2 = LayerNormalization()(DO2)
D3 = Dense(664,activation='softsign')(N2)
DO3 = Dropout(0.09)(D3,training=True)
N3 = LayerNormalization()(DO3)
out = Dense(1,activation='linear')(N3)

mP = Model(inp,out)
mP.set_weights(OW)
mP.compile(loss='mse',optimizer='Adam')
mP.save(path + 'new_model.h5')

If I set training=False on the dropout layers, the model makes identical predictions to the original model. However, when the code is written as above the mean prediction is not close to the original/deterministic prediction.

Previous models that I had developed with dropout set to training had mean probabilistic predictions nearly identical to the deterministic model. Is there something I am doing incorrectly, or is this an issue with using LayerNormalization and active dropout? As far as I know, LayerNormalization has trainable parameters, so i didn't know if active dropout interferes with that. If it does, I am not sure how to remedy this.

This segment of code is for running a quick test and plotting the results:

inputs = np.zeros(shape=(1,10),dtype='float32')
inputsP = np.zeros(shape=(1000,10),dtype='float32')
opD = mD.predict(inputs)[0,0]
opP = mP.predict(inputsP).reshape(1000)
print('Deterministic: %.4f   Probabilistic: %.4f' % (opD,np.mean(opP)))

plt.scatter(0,opD,color='black',label='Det',zorder=3)
plt.scatter(0,np.mean(opP),color='red',label='Mean prob',zorder=2)
plt.errorbar(0,np.mean(opP),yerr=np.std(opP),color='red',zorder=2,markersize=0, capsize=20,label=r'$\sigma$ bounds')
plt.grid(axis='y',zorder=0)
plt.legend()
plt.tick_params(axis='x',labelsize=0,labelcolor='white',color='white',width=0,length=0)

And the resulting output and plot are shown below.

Deterministic: -0.9732 Probabilistic: -0.9011

uncertainty results

Upvotes: 3

Views: 715

Answers (1)

Will.Evo
Will.Evo

Reputation: 1177

Edit to my answer:

I think the problem is just an under-sampling from the model. The standard deviation of the predictions is directly tied to the dropout rate and thus the number of predictions you need to approximate the determistic model goes up as well. If you do an absurd test of the code below but with dropout set to 0.7 for each dropout layer, 100,000 samples is no longer enough to approximate the deterministic mean to within 10^-3 and the standard deviation of the predictions gets much larger.

import os

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Input

os.environ['CUDA_VISIBLE_DEVICES'] = '0'
GPUs = tf.config.experimental.list_physical_devices('GPU')
for gpu in GPUs:
    tf.config.experimental.set_memory_growth(gpu, True)

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
D2 = Dense(460, activation='softsign')(D1)
D3 = Dense(664, activation='softsign')(D2)
out = Dense(1, activation='linear')(D3)

mP = Model(inp, out)
mP.compile(loss='mse', optimizer='Adam')

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=False)
D2 = Dense(460, activation='softsign')(DO1)
DO2 = Dropout(0.16)(D2,training=True)
D3 = Dense(664, activation='softsign')(DO2)
DO3 = Dropout(0.09)(D3,training=True)
out = Dense(1, activation='linear')(DO3)

mP2 = Model(inp, out)
mP2.set_weights(mP.get_weights())
mP2.compile(loss='mse', optimizer='Adam')

data = np.zeros(shape=(100000, 10),dtype='float32')
res = mP.predict(data).reshape(data.shape[0])
res2 = mP2.predict(data).reshape(data.shape[0])

print (np.abs(res[0] - res2.mean()))

Upvotes: 1

Related Questions