Rgkpdx
Rgkpdx

Reputation: 299

Count parameters for Mistral 7B LLM model

If I load mistralai/Mistral-7B-v0.1 and try to count its parameters looping over model.parameters I get ~3.7B parameters, but I was obviously expecting ~7B.

  1. What am I doing wrong? (Does the fact the the model lives in two shards affect my calculation?)
  2. Is the memory footprint model.get_memory_footprint()=4.55GB looking reasonable for the 7B params in 4bits?
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-v0.1"

# Create quantization config 
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
        base_model_id, quantization_config=bnb_config
)

# Count params
def print_parameters(model):
    all_param = 0
    for param in model.parameters():
        all_param += param.numel()
    print(
        f"all params: {all_param} ||"
    )
    
print_parameters(model)
>>> 3752071168
print(model.num_parameters())
>>> 7241732096

Libraries:

transformers==4.36.1
torch==2.0.1
bitsandbytes==0.41.3.post2

Upvotes: 3

Views: 562

Answers (1)

pkoerber
pkoerber

Reputation: 1

It looks like most of the layers only lead to half of the expected parameters because they are 4 bit, e.g. (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) only has 4096*4096/2=8388608 parameters when looping over model.parameters().

Upvotes: 0

Related Questions