Reputation: 79
Our task is intended to demonstrate the benefit of using DMA to copy a large amount of data versus relying on the processor to directly handle the copying. The processor is an STM32F407 on the ST discovery board.
In order to measure the copying time, a GPIO pin must be turned ON during copying and OFF once it has been copied.
The code appears to be functional but it is currently showing the CPU taking about 2.15ms to complete and DMA about 4.5ms, which is the opposite of what is intended. I'm not sure if there simply isn't enough data for the faster speed of DMA to offset the overhead in setting it up perhaps?
I have tried both copying elements of an array using the CPU and also using the memcpy function which seemed to yield very similar times.
The function code is shown below:
DMASpeed(void)
{
#define elementNum 32000
int *ptr = NULL;
ptr = (int*)malloc(elementNum * sizeof(int));
int *ptr2 = NULL;
ptr2 = (int*)malloc(elementNum * sizeof(int));
for (int i = 0; i < elementNum; i++)
{
ptr[i] = 4;
}
LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;
// Initial value
// printf("BEFORE: dst = '%s'\n", dst);
// Transfer
printf("Initiate DMA Transfer...\n");
HAL_DMA_Start(&hdma_memtomem_dma2_stream0, (int)ptr, (int)ptr2, (elementNum * sizeof(int)));
LD5_GPIO_Port->BSRR = LD5_Pin;
printf("DMA Transfer initiated.\n");
// Poll for DMA completion
printf("Poll for DMA completion.\n");
HAL_DMA_PollForTransfer(&hdma_memtomem_dma2_stream0,
HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);
LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
printf("DMA complete.\n");
// Print result
// printf("AFTER: dst = '%s'\n", dst);
free(ptr);
free(ptr2);
ptr = (int*)malloc(elementNum * sizeof(int));
ptr2 = (int*)malloc(elementNum * sizeof(int));
for (int i = 0; i < elementNum; i++)
{
ptr[i] = i;
}
printf("Initiate CPU Transfer...\n");
LD6_GPIO_Port->BSRR = LD6_Pin;
// for (int i = 0; i<512; i++)
// {
// ptr2[i] = ptr[i];
// }
memcpy(ptr2, ptr, (elementNum * sizeof(int)));
printf("CPU Transfer Complete.\n");
LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;
free(ptr);
free(ptr2);
}
Thanks in advance for any assistance
Upvotes: 5
Views: 2659
Reputation: 1
For those, who google for "How to fasten DMA memory-to-memory transfer?" here is the piece of advice: force your compiler to allocate all HAL code, related to your DMA transfer to the RAM, the best is to the RAM exclusively coupled with the Core. Your compiler will generate function code, which will be copied to the specific RAM at startup, and then all that functions will be called from the RAM and sped up because of it. However, that is also true for copying "by hand". In this case, it is recommended to allocate to the RAM the following files/functions:
Upvotes: -1
Reputation: 67820
you try to proof something what is not the true. DMA memory to memory transfer will be always slower than direct CPU one. DMA was not intended to be faster than the CPU. it's there is to provide the transfer w without the CPU activity in the background. the core has always priority over the DMA.
MEM to MEM DMA transfer will be always slower than the CPU one
There is another problem as well. Many STM devices have memory areas which are not accessible by the DMA (for example CCMRAM).
Upvotes: 10
Reputation: 2547
Remove printf
in below code segment:
LD5_GPIO_Port->BSRR = LD5_Pin;
printf("DMA Transfer initiated.\n"); // <--Remove this
// Poll for DMA completion
printf("Poll for DMA completion.\n"); // <--Remove this
You are turning ON the pin and then printing large text , it is adding up in your total time calculation.
Remove all printf
OR atleast do not print anything in between pin toggling.
EDIT:
To be precise you are printing 50 characters in case of DMA transfer and 23 characters in case of CPU transfer.
Upvotes: 5