Reputation: 121
I keep getting a CUDA out of memory
error when trying to fine-tune a Hugging Face pretrained XML Roberta model. So, the first thing I want to find out is the size of the pretrained model.
model = XLMRobertaForCausalLM.from_pretrained('xlm-roberta-base', config=config)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
I have tried to get the size of the model with
sys.getsizeof(model)
and, unsurprisingly, I get an incorrect result. I get 56 as a result, which is the size of the python object.
But then, I tried model. element_size()
, and I get the error
ModuleAttributeError: 'XLMRobertaForCausalLM' object has no attribute 'element_size'
I have searched in the Hugging Face documentation, but I have not found how to do it. Does anyone here know how to do it?
Upvotes: 6
Views: 9626
Reputation: 83157
To get the size, the number of parameters and the data type of each layer of a Hugging Face pretrained model:
import torch
from transformers import AutoModel
model_name = "xlm-roberta-base"
model = AutoModel.from_pretrained(model_name)
def get_layer_sizes(model):
layer_sizes = {}
total_size = 0
for name, param in model.named_parameters():
layer_size = param.numel() * param.element_size() # numel() returns the number of elements, element_size() returns the size in bytes of each element
total_size += layer_size
layer_sizes[name] = (param.numel(), layer_size, param.dtype)
return layer_sizes, total_size
layer_sizes, total_size = get_layer_sizes(model)
for name, size in layer_sizes.items():
print(f"Layer: {name}; Number of parameters: {size[0]:,} ({size[2]}); Size: {size[1] / (1024 ** 2):.2f} MiB")
print(f"Total Model Size: {total_size / (1024 ** 2):.2f} MiB")
outputs:
Layer: embeddings.word_embeddings.weight; Number of parameters: 192,001,536 (torch.float32); Size: 732.43 MiB
Layer: embeddings.position_embeddings.weight; Number of parameters: 394,752 (torch.float32); Size: 1.51 MiB
Layer: embeddings.token_type_embeddings.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: embeddings.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: embeddings.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.0.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.0.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.0.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.0.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.0.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.0.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.0.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.0.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.1.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.1.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.1.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.1.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.1.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.1.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.1.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.1.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.2.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.2.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.2.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.2.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.2.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.2.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.2.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.2.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.3.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.3.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.3.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.3.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.3.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.3.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.3.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.3.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.4.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.4.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.4.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.4.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.4.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.4.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.4.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.4.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.5.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.5.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.5.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.5.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.5.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.5.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.5.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.5.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.6.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.6.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.6.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.6.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.6.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.6.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.6.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.6.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.7.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.7.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.7.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.7.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.7.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.7.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.7.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.7.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.8.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.8.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.8.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.8.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.8.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.8.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.8.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.8.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.9.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.9.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.9.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.9.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.9.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.9.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.9.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.9.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.10.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.10.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.10.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.10.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.10.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.10.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.10.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.10.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.self.query.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.11.attention.self.query.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.self.key.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.11.attention.self.key.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.self.value.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.11.attention.self.value.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.output.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: encoder.layer.11.attention.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.attention.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.intermediate.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.11.intermediate.dense.bias; Number of parameters: 3,072 (torch.float32); Size: 0.01 MiB
Layer: encoder.layer.11.output.dense.weight; Number of parameters: 2,359,296 (torch.float32); Size: 9.00 MiB
Layer: encoder.layer.11.output.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.output.LayerNorm.weight; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: encoder.layer.11.output.LayerNorm.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Layer: pooler.dense.weight; Number of parameters: 589,824 (torch.float32); Size: 2.25 MiB
Layer: pooler.dense.bias; Number of parameters: 768 (torch.float32); Size: 0.00 MiB
Total Model Size: 1060.65 MiB
Upvotes: 0
Reputation: 391
Size of the pretrained weights can be found on the models website under files by checking e.g. pytorch_model.bin
. For Bert this gives ~440MB https://huggingface.co/bert-base-uncased/tree/main
Note that the model that ends up on the GPU may be smaller or larger than this.
Upvotes: 0
Reputation: 121992
Size is a little ambiguous here so here's both the answers for
Use the .num_parameters()
function, e.g.
from transformers import MarianMTModel
model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model.num_parameters()
[out]:
74410496
First install this:
pip install -U nvidia-ml-py3
Then in code:
from pynvml import *
from transformers import MarianMTModel
def print_gpu_utilization():
nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(handle)
print(f"GPU memory occupied: {info.used//1024**2} MB.")
model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model.to('cuda')
print_gpu_utilization()
[out]:
GPU memory occupied: 884 MB.
Upvotes: 6
Reputation: 33
If you facing CUDA out of memory
errors, the problem is mostly not the model, rather than the training data. You can reduce the batch_size
(number of training examples used in parallel), so your gpu only need to handle a few examples each iteration and not a ton of.
However, to your question:
I would recommend you objsize. It is a library that calculates the "real" size (also known as "deep" size). So a straightforward solution would be:
import objsize
objsize.get_deep_size(model)
However, the documentation says:
Excluding non-exclusive objects. That is, objects that are also referenced from somewhere else in the program. This is true for calculating the object's deep size and for traversing its descendants.
This shouldn't be a problem, but if it still gets a too small size for your model you can use Pympler, another Library that calculates the "deep" size via recursion.
Another approach would be implementing a get_deep_size()
function by yourself, e.g. from this article:
import sys
def get_size(obj, seen=None):
"""Recursively finds size of objects"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if isinstance(obj, dict):
size += sum([get_size(v, seen) for v in obj.values()])
size += sum([get_size(k, seen) for k in obj.keys()])
elif hasattr(obj, '__dict__'):
size += get_size(obj.__dict__, seen)
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum([get_size(i, seen) for i in obj])
return size
Upvotes: 1