Reputation: 141
I have a Xilinx SoC, and have created a simple multiplier on the programmable logic via verilog. The multiplier takes two 16 bit input, multiplies them and returns a 32 bit output. The digital design have been packaged and linked to the processor system that is within the SoC via an AXI-Lite interface. The Xilinx tools have auto-generated a device tree entity for this digital design so that a custom linux device driver can be created to interact with the digital design (i.e the PS will treat it just like an external hardware device connected to the ARM processor).
The device tree generated looks like this:
/ {
amba_pl: amba_pl@0 {
#address-cells = <2>;
#size-cells = <2>;
compatible = "simple-bus";
ranges ;
multi2_0: multi2@a0000000 {
clock-names = "s00_axi_aclk";
clocks = <&zynqmp_clk 71>;
compatible = "xlnx,multi2-1.0";
reg = <0x0 0xa0000000 0x0 0x10000>;
xlnx,s00-axi-addr-width = <0x4>;
xlnx,s00-axi-data-width = <0x20>;
};
};
};
So from the device tree we can see that the multiplier ("multi2-1.0") has a physical memory address of 0xa0000000, with address width of 0x4 and data width of 32 bits.
So, from the device driver point of view, specifically in the write call-back function, I am writing a 32-bit number into the virtual memory address that was retrieved from "ioremap(.)" function.
A sanity check was done to see the virtual memory mapping to physical address, and it seems to be correctly done with no errors (some memory-related code snippets from the driver is shown below):
struct simpmod_local {
int irq;
unsigned long mem_start;
unsigned long mem_end;
void __iomem *base_addr;
};
struct simpmod_local *lp = NULL;
......
static int simpmod_probe(struct platform_device *pdev)
{ .....
lp->base_addr = ioremap(lp->mem_start, lp->mem_end - lp->mem_start + 1);
...
dev_info(dev,"simpmod at 0x%08x mapped to 0x%08x, irq=%d\n",
(unsigned int __force)lp->mem_start,
(unsigned int __force)lp->base_addr,
lp->irq);
....
}
The write call-back function (as of now), is just taking a 32-bit number and placing it into memory. However, the read call-back function is just reading that exact same number even though I am reading from a base_address+0x20_offset. I have tried changing the offset value, but regardless, it keeps reading that same number.
My intuition tell me, that if reading from a different memory address the value should be either garbage value or zero, but its very unlikely to be reading the same value written to the base address. Any ideas to why the written data is copied across the entire allocated memory space?
Even doing a devmem command <devmem 0xa0000000 w 52> will produce the output 52 when executing <devmem 0xa0000020 w> or <devmem 0xa0000040 w> or ......
The write-callback function looks like this:
static ssize_t dev_write(struct file *fil, const char *buf, size_t len, loff_t *off){
sscanf (buf,"%d,%d",&operand_1,&operand_2);
ker_buf[len] = 0 ;
iowrite32((unsigned int) operand_1, lp->base_addr);
return len;
}
The full project code (with minor changes) can be found on https://forums.xilinx.com/t5/Embedded-Linux/Memory-Replications-during-write-call-back-function-in-Linux/m-p/1212405
Upvotes: 1
Views: 495
Reputation: 33621
Caveat: This isn't so much a solution as some observations and things to try [in no particular order].
At present, you've got multiple potential sources of error: bad H/W logic, incorrect device driver.
From the linked driver code, most return
statements return an error code (e.g. -ENOMEM
) but some do return -1
. This is inconsistent.
As I mentioned in my comments, you've got a bunch of globals. There is no interthread locking. So, you could have race conditions.
I presume you're booting petalinux
. And, it is working as long as you don't access your device. This is a big deal [in a good way].
I'm assuming that you're communicating with it via a serial cable from your development system [running (e.g.) minicom
] to an onboard UART. So, you get a login prompt and/or shell.
This means that the UART driver source [and the corresponding dtb/dts] is available. You can use that as your reference driver. Or, something else like GPIO, etc.
I notice that you mentioned ZYNQ
[which is a fairly popular Xilinx FPGA chip]. I'll assume that you're also using a standard SDK board with a ZYNQ
chip on it. So, Vivado will already know about the board interconnect/layout.
And, I assume that Vivado is able to pass off the board definition to Xilinx's S/W SDK/builder, so that it can build a compatible petalinux
kernel.
I have never seen writing a value, and reading it correctly but having that data replicated throughout memory.
This means that the address matching logic in your device is responding not merely to its assigned address range, but many more addresses that it shouldn't. There could be overlap with other devices and they could be contending/racing.
I'm no Vivado expert, but ...
From your link, looking at the .png
for one of the Vivado windows, it says that the AXI BASEADDR
is 0xFFFFFFFF
and that AXI HIGHADDR
is 0x00000000
. Both have a blue i
on them.
These are highly suspicious to me because I think these values should match with the values in the DTB entry. And, the BASEADDR
value makes no sense to me.
I'm wondering if the DTB could be generated to some sane address but the actual H/W logic generated is different.
This could easily cause all the symptoms you're seeing.
One thing that might help is to add chipscope
to the H/W design so you can debug your H/W logic and/or observe any access to a given port/address range.
You're using copy_to_user
et. al. But, this can fail and you're not checking the error code. I'd also do a printk
on the arguments being passed.
There is no guarantee that the len
value passed to dev_read/dev_write
is sufficient to contain the transfer size. In dev_read
, you do ioread32
. But, then you do: int n = sprintf(ker_buf, "%d\n", read_val);
You're not checking n
vs len
to ensure there's enough room. And, you're not examining/honoring loff_t
Both these functions are passed a struct file
pointer. But, this value is ignored in favor of the global variables you've already set. As, I mentioned in my top comments, using these globals is problematic. You should use the passed pointer to find the appropriate struct
pointers and [ultimately] your private device struct simpmod_local
.
Your dev_write
should store the values from userspace into the private struct. The dev_read
should get them from there.
Here's a total guess: Most designs I've seen use full AXI
rather than AXI
lite. I know nothing about what constitutes an "AXI thread ID", so I don't know what the implications of your access code bouncing between cores might be [if anything].
Using dev_write/dev_read
as you're doing isn't atomic. I think, at present, you've got more fundamental issues. But, long term, I'd replace this with an ioctl
call that takes a struct
, such as:
struct mymult_user {
u32 operand_1;
u32 operand_2;
u32 result;
};
The ioctl
call does copy_from_user
on this. Sends these values to the H/W, gets back the result. And, returns the result to the ioctl caller. Or, it can do a copy_to_user
on the result
field in the struct
.
Overall, you're more likely to get a [useful] response on Xilinx's forum page [as it's frequented by people who do this stuff all the time].
UPDATE:
Something else I noticed.
The DTB entry specifies the AXI data width to be 0x20. This is 32 bytes!? It's autogenerated so it must be correct ;-) But, this seems excessive to me. It may just be related to the width of the AXI data bus, so, maybe not an issue ...
But, looking at the driver, the offsets from the base address don't seem to match up.
operand_1
is offset 0x10, operand_2
is offset 0x20, and the result is offset 0x30. So, what's at offset 0x0???
The width of the AXI bus and the width of the registers may not be strictly related.
One way to view this is that the offsets should be aligned to the bus width: 0x0, 0x20, 0x40.
But, ordinarily, I'd expect things to be more closely packed. (e.g.) offsets 0x0, 0x2, 0x4 respectively.
It might be less painful [less chance of memory/bus corruption] to just do ioread*
while debugging. Since you're not writing to the address space, it's less likely to corrupt other memory cells and the system may stay alive [uncorrupted] longer. This would only give you whatever value was in result reg initially.
Also, you could write the operands and loop on ioread32
for offsets (e.g.) 0x0-0x40 and printk
those values.
Upvotes: 1