Reputation: 57
I was wondering if maximum bandwidth can differ in Programmed I/O versus DMA. My confusion comes from the following
If we have only one bus for cpu, memory and I/O device and we are using Programmed I/O, does the data goes directly from I/O to memory when we are reading something or it goes to CPU first then to memory. Meaning if we can transfer 10 bytes per transfer and each transfer takes 20 miliseconds, does that mean that with Programmed I/O max bandwith per second is
10 * 20 * 10^-9 ?
or do I need to consider the fact that it goes to CPU first and then from CPU tot memory ?
Upvotes: 1
Views: 4077
Reputation: 363980
If you had a system laid out like you suggest in your question, and CPU <-> Northbridge bandwidth was the bottleneck, then yes maybe you could get data from an extremely high bandwidth device into RAM faster with DMA than PIO. It sounds plausible that that could possibly happen in some systems with VERY fast devices relative to the CPU's connection to the outside world.
Of course, modern x86 systems don't even have a Northbridge, because the memory controller is on-chip. That doesn't invalidate the question about computer architecture, but it does render it less relevant. IDK what other chips (like ARM) tend to do. But since high levels of integration are common (including SoC (System on Chip)), I wouldn't be surprised if separate Northbridges are disappearing from the non-x86 world as well.
The difference between PIO and DMA is mostly not about which can achieve the highest bandwidth, but which slows the system down more. Running a copy loop more or less fully occupies a CPU core for the entire duration of the copy. This was an even bigger deal before multi-core CPUs.
A copy loop also pollutes caches, but this can be mostly avoided with special cache-bypassing instructions.
On x86, the in
and out
programmed-I/O instructions can't pipeline particularly well. They're not serializing like CPUID
, but they do drain the store buffer and aren't very friendly to out-of-order execution. This is one reason memory-mapped I/O with regular load/store operations is preferred over in
/out
for PIO. e.g. just to write a device's IO registers to initiate a DMA transfer, or for small transfers where DMA isn't worth it.
Upvotes: 1
Reputation: 269
PIO usually goes directly to the CPU. Even CISC instructions which copy from memory-to-memory still need to use an instruction slot on the CPU pipeline. The loading instruction may experience a large bus-delay also compared to RAM reads. Also factor in a loop instruction if you are copying a large amount and it's easy to see why DMA is more efficient.
Upvotes: 2