profiling of Llama 3.1 8B model on AI Accelerator

Question

I have the profiling results of the inference of Llama 3.1 8b model by Meta. I deployed the model on the AI Accelerator. I managed to create a memory trace of the whole model from the Host to the device. I managed to map the each layer and each segment of the model with the memory size in order to identify whats going to the AI Accelerator.

I have managed to identify each segment of model layer but there are still 2 values which I cannot identify, as there is nothing left in the model.

The image shows the size of each layer that is being sent from the Host to the Device

In the Image above you may see 2 blank cells each of size 16777216 Bytes. I cannot understand what these values might be as I have listed everything that could be on the model. There are biases but their size is way too low compared to this.

I was expecting to analyze the whole model transfer to the AI Accelerator and see the profiling of the inference of Llama 3.1 8b.

I tried to map each segment of the layer to size but there are 2 values which I have no idea what it might be.

profiling of Llama 3.1 8B model on AI Accelerator

Answers (0)

Related Questions