Reputation: 1
model = BertModel.from_pretrained("bert-base-uncased")
layer = 0
attention_block = model.encoder.layer[layer].attention.self
I want to retrieve the parameters (queries, keys, values, attention_output) of attention_block
, for each head separately.
For head 0 for example, I have tried
queries = []
for name, mod in model.named_modules():
if name==f'encoder.layer.{layer}.attention.self.query':
queries.append(next(mod.parameters()))
h = 0 # head number
dim_per_head = 64
Q_0 = queries[0][h * dim_per_head : (h + 1) * dim_per_head, :] # query matrix of head 0
and same for keys, values and output weights, but I am not sure this is the right way of dividing the tensor queries[0]
in slices. Does anyone have more information about parameter implementation in bert_pretrained?
Thank you!
Upvotes: 0
Views: 116