Reputation: 321
A bug has recently been "fixed" in a project I work on, but so far nobody has been able to explain why the fix works. (So is it really a fix?) The code is running in kernel space under a realtime system, so the problem causes a complete system lockup. This makes debugging harder than normal, too.
This version crashes the system:
int dups[EMCMOT_MAX_AXIS] = {0};
char *coords = coordinates;
char coord_letter[] = {'X','Y','Z','A','B','C','U','V','W'};
This version does not crash
int dups[EMCMOT_MAX_AXIS];
char *coords = coordinates;
char coord_letter[] = {'X','Y','Z','A','B','C','U','V','W'};
int i;
for (i=0; i<EMCMOT_MAX_AXIS; i++) {dups[i] = 0;}
To really confuse matters, this experimental version also crashes
int dups[EMCMOT_MAX_AXIS] = {0};
char *coords = coordinates;
char coord_letter[] = {'X','Y','Z','A','B','C','U','V','W'};
int i;
for (i=0; i<EMCMOT_MAX_AXIS; i++) {dups[i] = 0;}
You can see the commit and the surrounding code here: https://github.com/LinuxCNC/linuxcnc/commit/ef6f36a16c7789af258d34adf4840d965f4c0b10
Upvotes: 2
Views: 184
Reputation: 321
Thanks to Nate Eldredge for setting up the Compiler Explorer and 0andriy for the %xmm0
pointer. This does look like an issue with the compiler using register unsafe for kernel code (or some closely related issue). Experimenting with the Godbolt site I was able to find that the -mno-sse2
compiler flag has a similar effect to switching to gcc-6 in removing use of the $xmm0
register in that code. And, when added to the compiler flags in the actual application compilation it appears to solve the issue. Some more work is likely to be needed to get to the bottom of the right solution but we seem to have some good pointers now.
Upvotes: 2