Reputation: 111
In layer normalization, we compute mean and variance across the input layer (instead of across batch which is what we do in batch normalization). And then normalize the input layer according to mean and variance, and then return gamma times normalized layer plus beta.
My question is, are the gamma and beta scalars with shape (1, 1) and (1, 1) respectively or their shapes are (1, number of hidden units) and (1, number of hidden units) respectively.
Here is how I have implemented the layer normalization, is this correct!
def layernorm(layer, gamma, beta):
mean = np.mean(layer, axis = 1, keepdims = True)
variance = np.mean((layer - mean) ** 2, axis=1, keepdims = True)
layer_hat = (layer - mean) * 1.0 / np.sqrt(variance + 1e-8)
outpus = gamma * layer_hat + beta
return outpus
where gamma and beta are defined as below:
gamma = np.random.normal(1, 128)
beta = np.random.normal(1, 128)
Upvotes: 1
Views: 2329
Reputation: 2279
According to the Tensorflow's implementation, assume the input has shape [B, rest]
, gamma and beta are of shape rest
. rest
could be (h, ) for a 2-dimensional input or (h, w, c) for a 4-dimensional input.
Upvotes: 1