How can I load the weights of some layer of a trained network in a new network in returnn?

Question

I have the trained weights of the following network in a folder path/to/modelFile:

network={
"conv_1" : {"class": "conv", "filter_size": (400,), "activation":"abs" , "padding": "valid", "strides": 10, "n_out": 64 },
"pad_conv_1_time_dim" : {"class": "pad", "axes": "time", "padding": 20, "from": ["conv_1"]},
"conv_2" : {"class": "conv", "input_add_feature_dim": True, "filter_size": (40, 64), "activation":"abs", "padding": "valid","strides": 16, "n_out": 128, "from": ["pad_conv_1_time_dim"]},
"flatten_conv": {"class": "merge_dims", "axes": "except_time","n_out": 128,  "from": ["conv_2"]},
"window_1": {"class": "window", "window_size": 17, "from": ["flatten_conv"]},
"flatten_window": {"class": "merge_dims", "axes":"except_time","from": ["window_1"]},
"lin_1" :   { "class" : "linear", "activation": None, "n_out": 512,"from" : ["flatten_window"] },
"ff_2" :   { "class" : "linear", "activation": "relu", "n_out": 2000, "from" : ["lin_1"] },
"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["ff_2"] }
}

and I want to load the trained weights of the layers "conv_1" and "conv_2" into the following network:

network={
"conv_1" : {"class": "conv", "filter_size": (400,), "activation": "abs" , "padding": "valid", "strides": 10, "n_out": 64 },
"pad_conv_1_time_dim" : {"class": "pad", "axes": "time", "padding": 20, "from": ["conv_1"]},
"conv_2" : {"class": "conv", "input_add_feature_dim": True, "filter_size": (40, 64), "activation":"abs", "padding": "valid", "strides": 16, "n_out": 128, "from": ["pad_conv_1_time_dim"]},
"flatten_conv": {"class": "merge_dims", "axes": "except_time", "n_out": 128,  "from": ["conv_2"]},
"lstm1_fw" : { "class": "rec", "unit": "lstmp", "n_out" : rnnLayerNodes, "direction": 1, "from" : ['flatten_conv'] },
"lstm1_bw" : { "class": "rec", "unit": "lstmp", "n_out" : rnnLayerNodes, "direction": -1, "from" : ['flatten_conv'] },
"lin_1" :   { "class" : "linear", "activation": None, "n_out": 512, "from" : ["lstm1_fw", "lstm1_bw"] },
"ff_2" :   { "class" : "linear", "activation": "relu", "n_out": 2000, "from" : ["lin_1"] },
"ff_3" :   { "class" : "linear", "activation": "relu", "n_out": 2000,"from" : ["ff_2"] },
"ff_4" :   { "class" : "linear", "activation": "relu", "n_out": 2000,"from" : ["ff_3"] },
"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["ff_4"] }
}

How is that possible in returnn?

Albert · Accepted Answer

Using the SubnetworkLayer is one option. That would look like:

trained_network_model_file = 'path/to/model_file'

trained_network = {
"conv_1" : {"class": "conv", "filter_size": (400,), "activation": "abs" , "padding": "valid", "strides": 10, "n_out": 64 },
"pad_conv_1_time_dim" : {"class": "pad", "axes": "time", "padding": 20, "from": ["conv_1"]},
"conv_2" : {"class": "conv", "input_add_feature_dim": True, "filter_size": (40, 64), "activation":"abs", "padding": "valid", "strides": 16, "n_out": 128, "from": ["pad_conv_1_time_dim"]},
"flatten_conv": {"class": "merge_dims", "axes": "except_time","n_out": 128,  "from": ["conv_2"]}
}

network = {
"conv_layers" : { "class" : "subnetwork", "subnetwork": trained_network, "load_on_init": trained_network_model_file, "n_out": 128},
"lstm1_fw" : { "class": "rec", "unit": "lstmp", "n_out" : rnnLayerNodes, "direction": 1, "from" : ['conv_layers'] },
"lstm1_bw" : { "class": "rec", "unit": "lstmp", "n_out" : rnnLayerNodes, "direction": -1, "from" : ['conv_layers'] },
"lin_1" :   { "class" : "linear", "activation": None, "n_out": 512, "from" : ["lstm1_fw", "lstm1_bw"] },
"ff_2" :   { "class" : "linear", "activation": "relu", "n_out": 2000, "from" : ["lin_1"] },
"ff_3" :   { "class" : "linear", "activation": "relu", "n_out": 2000, "from" : ["ff_2"] },
"ff_4" :   { "class" : "linear", "activation": "relu", "n_out": 2000, "from" : ["ff_3"] },
"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["ff_4"] }
}

I think this would be my preferred option in your case.

Otherwise, there is the custom_param_importer option for every layer, and you might get it to work with that.

Then, for many layers, you can define the initializer for the params, e.g. for ConvLayer, you can use forward_weights_init. There functions like load_txt_file_initializer could be used, or maybe a similar function should be added to load directly from a TF checkpoint file.

How can I load the weights of some layer of a trained network in a new network in returnn?

Answers (1)

Related Questions