Reputation: 135
I need to transfer big blocks of data (~6MB) to my driver from user space. In the driver, I allocate 2 3MB chunks per block using pci_alloc_consistent(). I then mmap() each block (i.e. 2 chunks) to a single vma using vm_insert_page(). This allows user space to read/write each block after mmap'ing it. It seems to work but the performance is not acceptable.
I also implemented another way of writing/reading to/from the memory allocated by pci_alloc_consistent() in the driver. I use write() from user space and then copy_from_user() in the driver to move content of each chunk in the block to the above memory. I do the opposite for reads.
I found that the first approach was at least 2-3 times slower and used ~40% more cpu. I expected that introduction of an additional buffer copy in the second case would make it slower. However, that was not the case.
I ran thest tests on x86 64-bit platforms, kernels: 2.6.* and 3.*.
Do the above results make sense? If yes, can someone please provide some background on what is taking place?
Thanks.
Upvotes: 3
Views: 1147
Reputation: 501
caching is probably disabled. Did you ioremap_cache() the chunks that you allocated and vm_inserted? Iv come across this kind of problem on x86/x86_64 and has to do with PAT(page attribute table). You need to ioremap_cache() the physical pages to set the memory type as cache-able and then call vm_insert_page. That should fix your performance issue.
Upvotes: 3