What are the shapes of beta and gamma parameters in layer normalization layer?

Question

In layer normalization, we compute mean and variance across the input layer (instead of across batch which is what we do in batch normalization). And then normalize the input layer according to mean and variance, and then return gamma times normalized layer plus beta.

My question is, are the gamma and beta scalars with shape (1, 1) and (1, 1) respectively or their shapes are (1, number of hidden units) and (1, number of hidden units) respectively.

Here is how I have implemented the layer normalization, is this correct!

def layernorm(layer, gamma, beta):
    mean = np.mean(layer, axis = 1, keepdims = True)
    variance = np.mean((layer - mean) ** 2, axis=1, keepdims = True)
    layer_hat = (layer - mean) * 1.0 / np.sqrt(variance + 1e-8)
    outpus = gamma * layer_hat + beta
    return outpus

where gamma and beta are defined as below:

gamma = np.random.normal(1, 128)
beta = np.random.normal(1, 128)

Maybe · Accepted Answer

According to the Tensorflow's implementation, assume the input has shape [B, rest], gamma and beta are of shape rest. rest could be (h, ) for a 2-dimensional input or (h, w, c) for a 4-dimensional input.

What are the shapes of beta and gamma parameters in layer normalization layer?

Answers (1)

Related Questions