PyTorch : Aggregate two models

Question

Hello and greetings from Greece

class Model(nn.Module):

def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):
    super().__init__()
    self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
    self.embedding_dropout = nn.Dropout(p)
    self.batch_norm_num = nn.BatchNorm1d(num_numerical_cols)

    all_layers = []
    num_categorical_cols = sum((nf for ni, nf in embedding_size))
    input_size = num_categorical_cols + num_numerical_cols

    for i in layers:
        all_layers.append(nn.Linear(input_size, i))
        all_layers.append(nn.ReLU(inplace=True))
        all_layers.append(nn.BatchNorm1d(i))
        all_layers.append(nn.Dropout(p))
        input_size = i

    all_layers.append(nn.Linear(layers[-1], output_size))

    self.layers = nn.Sequential(*all_layers)

def forward(self, x_categorical, x_numerical):
    embeddings = []
    for i,e in enumerate(self.all_embeddings):
        embeddings.append(e(x_categorical[:,i]))
    x = torch.cat(embeddings, 1)
    x = self.embedding_dropout(x)

    x_numerical = self.batch_norm_num(x_numerical)
    x = torch.cat([x, x_numerical], 1)
    x = self.layers(x)
    return x

Suppose I have this nn for classification and I create two instances

model_1=Model(categorical_embedding_sizes, numerical_data.shape[1], 2, [200,100,50], p=0.4)
model_2=Model(categorical_embedding_sizes, numerical_data.shape[1], 2, [200,100,50], p=0.4)

Αnd after I trained these two models i saved them with torch.save as model_1.pt and model_2.pt Is there a way to create a new model with the mean parameters of the two models ?

something like

model_new.weight=(model_1.weight+model_2.weight)/2
model_new.bias=(model_1.bias+model_2.bias)/2

Thank you in advance

Ivan · Accepted Answer

You can easily do this by generating a state dictionary from your two models' state dictionaries:

state_1 = model_1.state_dict()
state_2 = model_2.state_dict()

for layer in state_1:
    state_1[layer] = (state_1[layer] + state_2[layer])/2

The above will loop through parameters (weights and biases) of all layers.

Then overwrite this new state on either model_1 or a newly instanced model, like so:

model_new = Model(categorical_embedding_sizes, numerical_data.shape[1], 2, [200,100,50], p=0.4)
model_new.load_state_dict(state1)

PyTorch : Aggregate two models

Answers (1)

Related Questions