GCC generates undesired assembly code

Question

I'm working on vectorizing loops, and GCC is giving me a hard time. When I look at the assembly code it generates, I see a lot of strange lines that I would like to get rid of.

For example, with vectorization, I've learnt that you can avoid a lot of extra assembly lines by giving additionnal information to GCC about array alignment. http://locklessinc.com/articles/vectorize/

Here is my experiment.

#define SIZE 1024
void itwillwork (const uint16_t * a, const  uint16_t * b, uint16_t * comp) {
    int i = 0;
    comp[i]=a[i]|b[i];
}

Generates simple assembly:

.globl  _ZN8Test_LUT7performEv
  23                _ZN8Test_LUT7performEv:
  24                .LFB664:
  25                    .cfi_startproc
  26 0020 488B4710      movq    16(%rdi), %rax
  27 0024 488B4F08      movq    8(%rdi), %rcx
  28 0028 488B5720      movq    32(%rdi), %rdx
  29 002c 0FB700        movzwl  (%rax), %eax
  30 002f 660B01        orw (%rcx), %ax
  31 0032 668902        movw    %ax, (%rdx)
  32 0035 C3            ret
  33                    .cfi_endproc

But, even if I was expecting a few extra lines, I am very surprised by what I got after adding a loop :

#define SIZE 1024
void itwillwork (const uint16_t * a, const  uint16_t * b, uint16_t * comp) {
    int i = 0;
    for(i=0;i



Generates this assembly with a lot more lines:

 233                _Z10itwillworkPKtS0_Pt:
 234                .LFB663:
 235                    .cfi_startproc
 236 0250 488D4210      leaq    16(%rdx), %rax
 237 0254 488D4E10      leaq    16(%rsi), %rcx
 238 0258 4839F0        cmpq    %rsi, %rax
 239 025b 410F96C0      setbe   %r8b
 240 025f 4839CA        cmpq    %rcx, %rdx
 241 0262 0F93C1        setnb   %cl
 242 0265 4108C8        orb %cl, %r8b
 243 0268 743E          je  .L55
 244 026a 4839F8        cmpq    %rdi, %rax
 245 026d 488D4710      leaq    16(%rdi), %rax
 246 0271 0F96C1        setbe   %cl
 247 0274 4839C2        cmpq    %rax, %rdx
 248 0277 0F93C0        setnb   %al
 249 027a 08C1          orb %al, %cl
 250 027c 742A          je  .L55
 251 027e 31C0          xorl    %eax, %eax
 252                    .p2align 4,,10
 253                    .p2align 3
 254                .L57:
 255 0280 F30F6F0C      movdqu  (%rsi,%rax), %xmm1
 255      06
 256 0285 F30F6F04      movdqu  (%rdi,%rax), %xmm0
 256      07
 257 028a 660FEBC1      por %xmm1, %xmm0
 258 028e F30F7F04      movdqu  %xmm0, (%rdx,%rax)
 258      02
 259 0293 4883C010      addq    $16, %rax
 260 0297 483D0008      cmpq    $2048, %rax
 260      0000
 261 029d 75E1          jne .L57
 262 029f F3C3          rep ret
 263                    .p2align 4,,10
 264 02a1 0F1F8000      .p2align 3
 264      000000
 265                .L55:
 266 02a8 31C0          xorl    %eax, %eax
 267 02aa 660F1F44      .p2align 4,,10
 267      0000
 268                    .p2align 3
 269                .L58:
 270 02b0 0FB70C06      movzwl  (%rsi,%rax), %ecx
 271 02b4 660B0C07      orw (%rdi,%rax), %cx
 272 02b8 66890C02      movw    %cx, (%rdx,%rax)
 273 02bc 4883C002      addq    $2, %rax
 274 02c0 483D0008      cmpq    $2048, %rax
 274      0000
 275 02c6 75E8          jne .L58
 276 02c8 F3C3          rep ret
 277                    .cfi_endproc


Both were compiled with gcc 4.8.4 in release mode, -O2 -ftree-vectorize -msse2.

Can somebody help me get rid of those lines? Or, if it's impossible, can you tell me why they are there ?

Update :

I've tried the tricks there http://locklessinc.com/articles/vectorize/, but I get another issue:

#define SIZE 1024
void itwillwork (const uint16_t * a, const  uint16_t * b, uint16_t * comp) {
    int i = 0;
    for(i=0;i


A few assembly lines are generated for this function, I get it.
But when I call this function from somewhere else : 

itwillwork(a,b,c);


There is no call instruction : the long list of instructions of "itwillwork" (the same as above) are used directly.
Am I missing something ? (the "extra lines" are the problem, not the inline call)

GCC generates undesired assembly code

Answers (1)

Related Questions