personal_cloud
personal_cloud

Reputation: 4524

AVX2 support in GCC 5 and later

I wrote the following class "T" to accelerate manipulations of "sets of characters" using AVX2. Then I found that it doesn't work in gcc 5 and later when I use "-O3". Can anyone help me trace this down to some programming construct that is known not to work on the latest compilers/systems?

How this code works: The underlying structure ("_bits") is a block of 256 bytes (aligned and allocated for AVX2), which can be accessed either as char[256] or AVX2 elements, depending on whether an element is accessed or the whole thing is used in a vector operation. Seems like it should work perfectly well on the AVX2 platform. No?

This is really hard to debug, because "valgrind" says it's clean, and I can't use a debugger (due to the problem disappearing when I remove "-O3"). But I am not happy with just going with the "|=" workaround because if this code is really wrong, then I'm probably making the same mistake in other places and screwing up everything I develop!

It is interesting to note that the "|" operator has the problem but the "|=" does not. Could the problem be related to returning a struct from a function? But I thought that returning a struct has worked since 1990 or something.

// g++ -std=c++11 -mavx2 -O3 gcc_fail.cpp

#include "assert.h"
#include "immintrin.h" // AVX

class T {
public:
  __m256i _bits[8];
  inline bool& operator[](unsigned char c)       {return ((bool*)_bits)[c];}
  inline bool  operator[](unsigned char c) const {return ((bool*)_bits)[c];}
  inline          T()                   {}
  inline explicit T(char const*);
  inline T     operator| (T const& b) const;
  inline T &   operator|=(T const& b);
  inline bool  operator! ()           const;
};

T::T(char const* s)
{
  _bits[0] = _bits[1] = _bits[2] = _bits[3] = _mm256_set1_epi32(0);
  _bits[4] = _bits[5] = _bits[6] = _bits[7] = _mm256_set1_epi32(0);
  char c;
  while ((c = *s++))
    (*this)[c] = true;
}

T T::operator| (T const& b) const
{
  T res;
  for (int i = 0; i < 8; i++)
    res._bits[i] = _mm256_or_si256(_bits[i], b._bits[i]);


  // FIXME why does the above code fail with -O3 in new gcc?
  for (int i=0; i<256; i++)
    assert(res[i] == ((*this)[i] || b[i]));
  // gcc 4.7.0 - PASS
  // gcc 4.7.2 - PASS
  // gcc 4.8.0 - PASS
  // gcc 4.9.2 - PASS
  // gcc 5.2.0 - FAIL
  // gcc 5.3.0 - FAIL
  // gcc 5.3.1 - FAIL
  // gcc 6.1.0 - FAIL


  return res;
}

T & T::operator|=(T const& b)
{
  for (int i = 0; i < 8; i++)
    _bits[i] = _mm256_or_si256(_bits[i], b._bits[i]);
  return *this;
}

bool T::operator! () const
{
  for (int i = 0; i < 8; i++)
    if (!_mm256_testz_si256(_bits[i], _bits[i]))
      return false;
  return true;
}

int Main()
{
  T sep (" ,\t\n");
  T end ("");
  return !(sep|end);
}

int main()
{
  return Main();
}

Upvotes: 3

Views: 742

Answers (1)

Iwillnotexist Idonotexist
Iwillnotexist Idonotexist

Reputation: 13467

Your code's problem is the use of bool* when you should have been using unsigned char*, which allowed GCC 5 to proceed with a pointer alias optimization.

The two dumps of the machine code for function Main(), produced both by GCC 4.8.5 and 5.3.1, are at the end of this answer in appendix for reference.

Looking at the code:

Decompilation

After the prologue, T sep's _bits are initialized to zero...

  _bits[0] = _bits[1] = _bits[2] = _bits[3] = _mm256_set1_epi32(0);
  _bits[4] = _bits[5] = _bits[6] = _bits[7] = _mm256_set1_epi32(0);

  40063d:       c5 fd 7f 44 24 60               vmovdqa %ymm0,0x60(%rsp)
  400643:       c5 fd 7f 44 24 40               vmovdqa %ymm0,0x40(%rsp)
  400649:       c5 fd 7f 44 24 20               vmovdqa %ymm0,0x20(%rsp)
  40064f:       c5 fd 7f 04 24                  vmovdqa %ymm0,(%rsp)
  400654:       c5 fd 7f 84 24 e0 00 00 00      vmovdqa %ymm0,0xe0(%rsp)
  40065d:       c5 fd 7f 84 24 c0 00 00 00      vmovdqa %ymm0,0xc0(%rsp)
  400666:       c5 fd 7f 84 24 a0 00 00 00      vmovdqa %ymm0,0xa0(%rsp)
  40066f:       c5 fd 7f 84 24 80 00 00 00      vmovdqa %ymm0,0x80(%rsp)

and then written to in a loop based on char* s.

  char c;
  while ((c = *s++))
    (*this)[c] = true;

  400680:       48 83 c2 01                     add    $0x1,%rdx
  400684:       c6 04 04 01                     movb   $0x1,(%rsp,%rax,1)
  400688:       0f b6 42 ff                     movzbl -0x1(%rdx),%eax
  40068c:       84 c0                           test   %al,%al
  40068e:       75 f0                           jne    400680 <_Z4Mainv+0x60>

Both compilers then initialize T end to 0s:

  400690:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  400694:       31 c0                           xor    %eax,%eax
  400696:       c5 fd 7f 84 24 60 01 00 00      vmovdqa %ymm0,0x160(%rsp)
  40069f:       c5 fd 7f 84 24 40 01 00 00      vmovdqa %ymm0,0x140(%rsp)
  4006a8:       c5 fd 7f 84 24 20 01 00 00      vmovdqa %ymm0,0x120(%rsp)
  4006b1:       c5 fd 7f 84 24 00 01 00 00      vmovdqa %ymm0,0x100(%rsp)
  4006ba:       c5 fd 7f 84 24 e0 01 00 00      vmovdqa %ymm0,0x1e0(%rsp)
  4006c3:       c5 fd 7f 84 24 c0 01 00 00      vmovdqa %ymm0,0x1c0(%rsp)
  4006cc:       c5 fd 7f 84 24 a0 01 00 00      vmovdqa %ymm0,0x1a0(%rsp)
  4006d5:       c5 fd 7f 84 24 80 01 00 00      vmovdqa %ymm0,0x180(%rsp)

Both compilers then optimize out the _mm256_or_si256() operations because T end is known to be 0. But then, GCC 4.8.5 copies from T sep to T res (which is computationally what happens when you OR anything into a zero variable), while GCC 5.3.1 initializes T res to 0. It's entitled to do that because in your operator [] method you cast a pointer of type __m256i* to bool*, and the compiler is permitted to assume the pointers do not alias. Thus in GCC 4.8.5 you see

  4006de:       c5 fd 6f 04 24                  vmovdqa (%rsp),%ymm0
  4006e3:       c5 fd 7f 84 24 00 02 00 00      vmovdqa %ymm0,0x200(%rsp)
  4006ec:       c5 fd 6f 44 24 20               vmovdqa 0x20(%rsp),%ymm0
  4006f2:       c5 fd 7f 84 24 20 02 00 00      vmovdqa %ymm0,0x220(%rsp)
  4006fb:       c5 fd 6f 44 24 40               vmovdqa 0x40(%rsp),%ymm0
  400701:       c5 fd 7f 84 24 40 02 00 00      vmovdqa %ymm0,0x240(%rsp)
  40070a:       c5 fd 6f 44 24 60               vmovdqa 0x60(%rsp),%ymm0
  400710:       c5 fd 7f 84 24 60 02 00 00      vmovdqa %ymm0,0x260(%rsp)
  400719:       c5 fd 6f 84 24 80 00 00 00      vmovdqa 0x80(%rsp),%ymm0
  400722:       c5 fd 7f 84 24 80 02 00 00      vmovdqa %ymm0,0x280(%rsp)
  40072b:       c5 fd 6f 84 24 a0 00 00 00      vmovdqa 0xa0(%rsp),%ymm0
  400734:       c5 fd 7f 84 24 a0 02 00 00      vmovdqa %ymm0,0x2a0(%rsp)
  40073d:       c5 fd 6f 84 24 c0 00 00 00      vmovdqa 0xc0(%rsp),%ymm0
  400746:       c5 fd 7f 84 24 c0 02 00 00      vmovdqa %ymm0,0x2c0(%rsp)
  40074f:       c5 fd 6f 84 24 e0 00 00 00      vmovdqa 0xe0(%rsp),%ymm0
  400758:       c5 fd 7f 84 24 e0 02 00 00      vmovdqa %ymm0,0x2e0(%rsp)

while in GCC 5.3.1 you see

  4006fa:       c5 fd 7f 85 f0 fe ff ff         vmovdqa %ymm0,-0x110(%rbp)
  400702:       c5 fd 7f 85 10 ff ff ff         vmovdqa %ymm0,-0xf0(%rbp)
  40070a:       c5 fd 7f 85 30 ff ff ff         vmovdqa %ymm0,-0xd0(%rbp)
  400712:       c5 fd 7f 85 50 ff ff ff         vmovdqa %ymm0,-0xb0(%rbp)
  40071a:       c5 fd 7f 85 70 ff ff ff         vmovdqa %ymm0,-0x90(%rbp)
  400722:       c5 fd 7f 45 90                  vmovdqa %ymm0,-0x70(%rbp)
  400727:       c5 fd 7f 45 b0                  vmovdqa %ymm0,-0x50(%rbp)
  40072c:       c5 fd 7f 45 d0                  vmovdqa %ymm0,-0x30(%rbp)

Whereupon the reads for the assert() then fail.

The Standard's Ruling on Pointer Aliasing:

ISO C++11 refers to aliasing under the following section, which makes clear that variables of type __m256i* cannot be accessed using bool*, but may be accessed with a char*/unsigned char*:

§ 3.10 Lvalues and rvalues [basic.lval]

[...]

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: [52]

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

52) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

Appendix

GCC 4.8.5:

0000000000400620 <_Z4Mainv>:
  400620:       55                              push   %rbp
  400621:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  400625:       ba e5 08 40 00                  mov    $0x4008e5,%edx
  40062a:       b8 20 00 00 00                  mov    $0x20,%eax
  40062f:       48 89 e5                        mov    %rsp,%rbp
  400632:       48 83 e4 e0                     and    $0xffffffffffffffe0,%rsp
  400636:       48 81 ec 00 03 00 00            sub    $0x300,%rsp
  40063d:       c5 fd 7f 44 24 60               vmovdqa %ymm0,0x60(%rsp)
  400643:       c5 fd 7f 44 24 40               vmovdqa %ymm0,0x40(%rsp)
  400649:       c5 fd 7f 44 24 20               vmovdqa %ymm0,0x20(%rsp)
  40064f:       c5 fd 7f 04 24                  vmovdqa %ymm0,(%rsp)
  400654:       c5 fd 7f 84 24 e0 00 00 00      vmovdqa %ymm0,0xe0(%rsp)
  40065d:       c5 fd 7f 84 24 c0 00 00 00      vmovdqa %ymm0,0xc0(%rsp)
  400666:       c5 fd 7f 84 24 a0 00 00 00      vmovdqa %ymm0,0xa0(%rsp)
  40066f:       c5 fd 7f 84 24 80 00 00 00      vmovdqa %ymm0,0x80(%rsp)
  400678:       0f 1f 84 00 00 00 00 00         nopl   0x0(%rax,%rax,1)
  400680:       48 83 c2 01                     add    $0x1,%rdx
  400684:       c6 04 04 01                     movb   $0x1,(%rsp,%rax,1)
  400688:       0f b6 42 ff                     movzbl -0x1(%rdx),%eax
  40068c:       84 c0                           test   %al,%al
  40068e:       75 f0                           jne    400680 <_Z4Mainv+0x60>
  400690:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  400694:       31 c0                           xor    %eax,%eax
  400696:       c5 fd 7f 84 24 60 01 00 00      vmovdqa %ymm0,0x160(%rsp)
  40069f:       c5 fd 7f 84 24 40 01 00 00      vmovdqa %ymm0,0x140(%rsp)
  4006a8:       c5 fd 7f 84 24 20 01 00 00      vmovdqa %ymm0,0x120(%rsp)
  4006b1:       c5 fd 7f 84 24 00 01 00 00      vmovdqa %ymm0,0x100(%rsp)
  4006ba:       c5 fd 7f 84 24 e0 01 00 00      vmovdqa %ymm0,0x1e0(%rsp)
  4006c3:       c5 fd 7f 84 24 c0 01 00 00      vmovdqa %ymm0,0x1c0(%rsp)
  4006cc:       c5 fd 7f 84 24 a0 01 00 00      vmovdqa %ymm0,0x1a0(%rsp)
  4006d5:       c5 fd 7f 84 24 80 01 00 00      vmovdqa %ymm0,0x180(%rsp)
  4006de:       c5 fd 6f 04 24                  vmovdqa (%rsp),%ymm0
  4006e3:       c5 fd 7f 84 24 00 02 00 00      vmovdqa %ymm0,0x200(%rsp)
  4006ec:       c5 fd 6f 44 24 20               vmovdqa 0x20(%rsp),%ymm0
  4006f2:       c5 fd 7f 84 24 20 02 00 00      vmovdqa %ymm0,0x220(%rsp)
  4006fb:       c5 fd 6f 44 24 40               vmovdqa 0x40(%rsp),%ymm0
  400701:       c5 fd 7f 84 24 40 02 00 00      vmovdqa %ymm0,0x240(%rsp)
  40070a:       c5 fd 6f 44 24 60               vmovdqa 0x60(%rsp),%ymm0
  400710:       c5 fd 7f 84 24 60 02 00 00      vmovdqa %ymm0,0x260(%rsp)
  400719:       c5 fd 6f 84 24 80 00 00 00      vmovdqa 0x80(%rsp),%ymm0
  400722:       c5 fd 7f 84 24 80 02 00 00      vmovdqa %ymm0,0x280(%rsp)
  40072b:       c5 fd 6f 84 24 a0 00 00 00      vmovdqa 0xa0(%rsp),%ymm0
  400734:       c5 fd 7f 84 24 a0 02 00 00      vmovdqa %ymm0,0x2a0(%rsp)
  40073d:       c5 fd 6f 84 24 c0 00 00 00      vmovdqa 0xc0(%rsp),%ymm0
  400746:       c5 fd 7f 84 24 c0 02 00 00      vmovdqa %ymm0,0x2c0(%rsp)
  40074f:       c5 fd 6f 84 24 e0 00 00 00      vmovdqa 0xe0(%rsp),%ymm0
  400758:       c5 fd 7f 84 24 e0 02 00 00      vmovdqa %ymm0,0x2e0(%rsp)
  400761:       0f 1f 80 00 00 00 00            nopl   0x0(%rax)
  400768:       80 3c 04 00                     cmpb   $0x0,(%rsp,%rax,1)
  40076c:       0f b6 8c 04 00 02 00 00         movzbl 0x200(%rsp,%rax,1),%ecx
  400774:       ba 01 00 00 00                  mov    $0x1,%edx
  400779:       75 08                           jne    400783 <_Z4Mainv+0x163>
  40077b:       0f b6 94 04 00 01 00 00         movzbl 0x100(%rsp,%rax,1),%edx
  400783:       38 d1                           cmp    %dl,%cl
  400785:       0f 85 b2 00 00 00               jne    40083d <_Z4Mainv+0x21d>
  40078b:       48 83 c0 01                     add    $0x1,%rax
  40078f:       48 3d 00 01 00 00               cmp    $0x100,%rax
  400795:       75 d1                           jne    400768 <_Z4Mainv+0x148>
  400797:       c5 fd 6f 8c 24 00 02 00 00      vmovdqa 0x200(%rsp),%ymm1
  4007a0:       31 c0                           xor    %eax,%eax
  4007a2:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  4007a7:       0f 94 c0                        sete   %al
  4007aa:       0f 85 88 00 00 00               jne    400838 <_Z4Mainv+0x218>
  4007b0:       c5 fd 6f 8c 24 20 02 00 00      vmovdqa 0x220(%rsp),%ymm1
  4007b9:       31 c0                           xor    %eax,%eax
  4007bb:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  4007c0:       0f 94 c0                        sete   %al
  4007c3:       75 73                           jne    400838 <_Z4Mainv+0x218>
  4007c5:       c5 fd 6f 8c 24 40 02 00 00      vmovdqa 0x240(%rsp),%ymm1
  4007ce:       31 c0                           xor    %eax,%eax
  4007d0:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  4007d5:       0f 94 c0                        sete   %al
  4007d8:       75 5e                           jne    400838 <_Z4Mainv+0x218>
  4007da:       c5 fd 6f 8c 24 60 02 00 00      vmovdqa 0x260(%rsp),%ymm1
  4007e3:       31 c0                           xor    %eax,%eax
  4007e5:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  4007ea:       0f 94 c0                        sete   %al
  4007ed:       75 49                           jne    400838 <_Z4Mainv+0x218>
  4007ef:       c5 fd 6f 8c 24 80 02 00 00      vmovdqa 0x280(%rsp),%ymm1
  4007f8:       31 c0                           xor    %eax,%eax
  4007fa:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  4007ff:       0f 94 c0                        sete   %al
  400802:       75 34                           jne    400838 <_Z4Mainv+0x218>
  400804:       c5 fd 6f 8c 24 a0 02 00 00      vmovdqa 0x2a0(%rsp),%ymm1
  40080d:       31 c0                           xor    %eax,%eax
  40080f:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  400814:       0f 94 c0                        sete   %al
  400817:       75 1f                           jne    400838 <_Z4Mainv+0x218>
  400819:       c5 fd 6f 8c 24 c0 02 00 00      vmovdqa 0x2c0(%rsp),%ymm1
  400822:       31 c0                           xor    %eax,%eax
  400824:       c4 e2 7d 17 c9                  vptest %ymm1,%ymm1
  400829:       0f 94 c0                        sete   %al
  40082c:       75 0a                           jne    400838 <_Z4Mainv+0x218>
  40082e:       31 c0                           xor    %eax,%eax
  400830:       c4 e2 7d 17 c0                  vptest %ymm0,%ymm0
  400835:       0f 94 c0                        sete   %al
  400838:       c5 f8 77                        vzeroupper 
  40083b:       c9                              leaveq 
  40083c:       c3                              retq   
  40083d:       b9 20 09 40 00                  mov    $0x400920,%ecx
  400842:       ba 26 00 00 00                  mov    $0x26,%edx
  400847:       be e9 08 40 00                  mov    $0x4008e9,%esi
  40084c:       bf f8 08 40 00                  mov    $0x4008f8,%edi
  400851:       c5 f8 77                        vzeroupper 
  400854:       e8 97 fc ff ff                  callq  4004f0 <__assert_fail@plt>
  400859:       0f 1f 80 00 00 00 00            nopl   0x0(%rax)

GCC 5:

0000000000400630 <_Z4Mainv>:
  400630:       4c 8d 54 24 08                  lea    0x8(%rsp),%r10
  400635:       48 83 e4 e0                     and    $0xffffffffffffffe0,%rsp
  400639:       b8 20 00 00 00                  mov    $0x20,%eax
  40063e:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  400642:       ba 25 08 40 00                  mov    $0x400825,%edx
  400647:       41 ff 72 f8                     pushq  -0x8(%r10)
  40064b:       55                              push   %rbp
  40064c:       48 89 e5                        mov    %rsp,%rbp
  40064f:       41 52                           push   %r10
  400651:       48 81 ec 08 03 00 00            sub    $0x308,%rsp
  400658:       c5 fd 7f 85 50 fd ff ff         vmovdqa %ymm0,-0x2b0(%rbp)
  400660:       c5 fd 7f 85 30 fd ff ff         vmovdqa %ymm0,-0x2d0(%rbp)
  400668:       c5 fd 7f 85 10 fd ff ff         vmovdqa %ymm0,-0x2f0(%rbp)
  400670:       c5 fd 7f 85 f0 fc ff ff         vmovdqa %ymm0,-0x310(%rbp)
  400678:       c5 fd 7f 85 d0 fd ff ff         vmovdqa %ymm0,-0x230(%rbp)
  400680:       c5 fd 7f 85 b0 fd ff ff         vmovdqa %ymm0,-0x250(%rbp)
  400688:       c5 fd 7f 85 90 fd ff ff         vmovdqa %ymm0,-0x270(%rbp)
  400690:       c5 fd 7f 85 70 fd ff ff         vmovdqa %ymm0,-0x290(%rbp)
  400698:       0f 1f 84 00 00 00 00 00         nopl   0x0(%rax,%rax,1)
  4006a0:       48 83 c2 01                     add    $0x1,%rdx
  4006a4:       c6 84 05 f0 fc ff ff 01         movb   $0x1,-0x310(%rbp,%rax,1)
  4006ac:       0f b6 42 ff                     movzbl -0x1(%rdx),%eax
  4006b0:       84 c0                           test   %al,%al
  4006b2:       75 ec                           jne    4006a0 <_Z4Mainv+0x70>
  4006b4:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  4006b8:       31 c0                           xor    %eax,%eax
  4006ba:       c5 fd 7f 85 50 fe ff ff         vmovdqa %ymm0,-0x1b0(%rbp)
  4006c2:       c5 fd 7f 85 30 fe ff ff         vmovdqa %ymm0,-0x1d0(%rbp)
  4006ca:       c5 fd 7f 85 10 fe ff ff         vmovdqa %ymm0,-0x1f0(%rbp)
  4006d2:       c5 fd 7f 85 f0 fd ff ff         vmovdqa %ymm0,-0x210(%rbp)
  4006da:       c5 fd 7f 85 d0 fe ff ff         vmovdqa %ymm0,-0x130(%rbp)
  4006e2:       c5 fd 7f 85 b0 fe ff ff         vmovdqa %ymm0,-0x150(%rbp)
  4006ea:       c5 fd 7f 85 90 fe ff ff         vmovdqa %ymm0,-0x170(%rbp)
  4006f2:       c5 fd 7f 85 70 fe ff ff         vmovdqa %ymm0,-0x190(%rbp)
  4006fa:       c5 fd 7f 85 f0 fe ff ff         vmovdqa %ymm0,-0x110(%rbp)
  400702:       c5 fd 7f 85 10 ff ff ff         vmovdqa %ymm0,-0xf0(%rbp)
  40070a:       c5 fd 7f 85 30 ff ff ff         vmovdqa %ymm0,-0xd0(%rbp)
  400712:       c5 fd 7f 85 50 ff ff ff         vmovdqa %ymm0,-0xb0(%rbp)
  40071a:       c5 fd 7f 85 70 ff ff ff         vmovdqa %ymm0,-0x90(%rbp)
  400722:       c5 fd 7f 45 90                  vmovdqa %ymm0,-0x70(%rbp)
  400727:       c5 fd 7f 45 b0                  vmovdqa %ymm0,-0x50(%rbp)
  40072c:       c5 fd 7f 45 d0                  vmovdqa %ymm0,-0x30(%rbp)
  400731:       0f 1f 80 00 00 00 00            nopl   0x0(%rax)
  400738:       0f b6 94 05 f0 fc ff ff         movzbl -0x310(%rbp,%rax,1),%edx
  400740:       0f b6 8c 05 f0 fe ff ff         movzbl -0x110(%rbp,%rax,1),%ecx
  400748:       84 d2                           test   %dl,%dl
  40074a:       75 08                           jne    400754 <_Z4Mainv+0x124>
  40074c:       0f b6 94 05 f0 fd ff ff         movzbl -0x210(%rbp,%rax,1),%edx
  400754:       38 d1                           cmp    %dl,%cl
  400756:       75 2c                           jne    400784 <_Z4Mainv+0x154>
  400758:       48 83 c0 01                     add    $0x1,%rax
  40075c:       48 3d 00 01 00 00               cmp    $0x100,%rax
  400762:       75 d4                           jne    400738 <_Z4Mainv+0x108>
  400764:       c5 f9 ef c0                     vpxor  %xmm0,%xmm0,%xmm0
  400768:       31 c0                           xor    %eax,%eax
  40076a:       c4 e2 7d 17 c0                  vptest %ymm0,%ymm0
  40076f:       0f 94 c0                        sete   %al
  400772:       c5 f8 77                        vzeroupper 
  400775:       48 81 c4 08 03 00 00            add    $0x308,%rsp
  40077c:       41 5a                           pop    %r10
  40077e:       5d                              pop    %rbp
  40077f:       49 8d 62 f8                     lea    -0x8(%r10),%rsp
  400783:       c3                              retq   
  400784:       b9 60 08 40 00                  mov    $0x400860,%ecx
  400789:       ba 26 00 00 00                  mov    $0x26,%edx
  40078e:       be 29 08 40 00                  mov    $0x400829,%esi
  400793:       bf 38 08 40 00                  mov    $0x400838,%edi
  400798:       c5 f8 77                        vzeroupper 
  40079b:       e8 50 fd ff ff                  callq  4004f0 <__assert_fail@plt>

Upvotes: 8

Related Questions