Count parameters for Mistral 7B LLM model

Question

If I load mistralai/Mistral-7B-v0.1 and try to count its parameters looping over model.parameters I get ~3.7B parameters, but I was obviously expecting ~7B.

What am I doing wrong? (Does the fact the the model lives in two shards affect my calculation?)
Is the memory footprint model.get_memory_footprint()=4.55GB looking reasonable for the 7B params in 4bits?

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-v0.1"

# Create quantization config 
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
        base_model_id, quantization_config=bnb_config
)

# Count params
def print_parameters(model):
    all_param = 0
    for param in model.parameters():
        all_param += param.numel()
    print(
        f"all params: {all_param} ||"
    )
    
print_parameters(model)
>>> 3752071168
print(model.num_parameters())
>>> 7241732096

Libraries:

transformers==4.36.1
torch==2.0.1
bitsandbytes==0.41.3.post2

Count parameters for Mistral 7B LLM model

Answers (1)

Related Questions