Reputation: 187
I have this code and I'm wondering why the compiler doesn't join the two character compares to a 16-bit compare.
inline bool beginswith (char c0, char c1, const char* s) {
return (s[0] == c0) & (s[1] == c1);
}
bool test(const char* s) {
return beginswith('0', 'x', s);
}
I would expect the two versions in my Godbolt link (test and test2) to compile to equivalent instructions. Is there something I'm missing because this seems like a pretty trivial optimization.
Edit: The reason I would like the compiler to do this optimization is so I can write portable code with no undefined behavior and still get full performance.
Upvotes: 4
Views: 220
Reputation: 213989
It appears that the compilers simply aren't smart enough to treat inlining as a special case here. They generate the very same code as they do when the function has external linkage, in which case there is no connection assumed between the parameters c0 and c1.
Inside the function, c0
and c1
reside in local copies according to ABI calling convention, which isn't necessarily placing them next to each other for aligned access - even though that does happen to be the case in your specific x86 example.
Changing the caller code to alignas(int) char str[2] = {'0', 'x'}; return beginswith(str[0],str[1],s);
doesn't make a difference, so alignment doesn't seem to be the (sole) reason either.
However, in case you change the function to this:
inline bool beginswith (char c0, char c1, const char* s)
{
char tmp[2] = {c0, c1};
return memcmp(tmp, s, 2);
}
Then you get identical machine code in both cases (godbolt].
test(char const*):
cmp WORD PTR [rdi], 30768
setne al
ret
It should be noted that *(const short*)s
in test2
is undefined behavior, a strict aliasing violation. The machine code for that line may not stay deterministic when the program starts to scale up or when optimizer settings are changed. This is mostly a quality of implementation thing, where the standard makes no guarantees and test2
might break at any point.
Upvotes: 1