Reputation: 7575
I'm using a TI AM3358 SoC, running an ARM Cortex-A8 processor, which runs Linux 3.12. I enabled a child device of the GPMC node in the device tree, which probes my driver, and in there I call ioremap_nocache()
with the resource provided by the device tree node to get an uncached region.
The reason I'm requesting no cache is that it's not an actual memory device which is connected to the GPMC bus, which would of course benefit from the processor cache, but an FPGA device. So accesses need to always go through the actual wires.
When I do this:
u16 __iomem *addr = ioremap_nocache(...);
iowrite16(1, &addr[0]);
iowrite16(1, &addr[1]);
iowrite16(1, &addr[2]);
iowrite16(1, &addr[3]);
ioread16(&addr[0]);
ioread16(&addr[1]);
ioread16(&addr[2]);
ioread16(&addr[3]);
I see the 8 accesses are done on the wires using a logic analyzer. However, when I do this:
u16 v;
addr[0] = 1;
addr[1] = 1;
addr[2] = 1;
addr[3] = 1;
v = addr[0];
v = addr[1];
v = addr[2];
v = addr[3];
I see the four write accesses, but not the subsequent read accesses.
Am I missing something? What would be the difference here between ioread16()
and a direct memory access, knowing that the whole GPMC range is supposed to be addressable just like memory?
Could this behaviour be the result of any compiler optimization which can be avoided? I didn't look at the generated instructions yet, but until then, maybe someone experienced enough has something interesting to reply.
Upvotes: 2
Views: 1807
Reputation: 7575
ioread*()
and iowrite*()
, on ARM, perform a data memory barrier followed by a volatile
access, e.g.:
#define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; })
#define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; })
#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; })
#define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); })
#define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); })
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })
__raw_read*()
and __raw_write*()
(where *
is b
, w
, or l
) may be used for direct reads/writes. They do the exact single instruction needed for those operations, casting the address pointer to a volatile
pointer.
__raw_writew()
example (store register, halfword):
#define __raw_writew __raw_writew
static inline void __raw_writew(u16 val, volatile void __iomem *addr)
{
asm volatile("strh %1, %0"
: "+Q" (*(volatile u16 __force *)addr)
: "r" (val));
}
Beware, though, that those two functions do not insert any barrier, so you should call rmb()
(read memory barrier) and wmb()
(write memory barrier) anywhere you want your memory accesses to be ordered.
Upvotes: 1