Christian Rau
Christian Rau

Reputation: 45948

Type of integer literals not int by default?

I just answered this question, which asked why iterating until 10 billion in a for loop takes so much longer (the OP actually aborted it after 10 mins) than iterating until 1 billion:

for (i = 0; i < 10000000000; i++)

Now my and many others' obvious answer was that it was due to the iteration variable being 32-bit (which never reaches 10 billion) and the loop getting an infinite loop.

But though I realized this problem, I still wonder what was really going on inside the compiler?

Since the literal was not appended with an L, it should IMHO be of type int, too, and therefore 32-bit. So due to overflow it should be a normal int inside the range to be reachable. To actually recognize that it cannot be reached from int, the compiler needs to know that it is 10 billion and therefore see it as a more-than-32-bit constant.

Does such a literal get promoted to a fitting (or at least implementation-defined) range (at least 64-bit, in this case) automatically, even if not appended an L and is this standard behaviour? Or is something different going on behind the scenes, like UB due to overflow (is integer overflow actually UB)? Some quotes from the Standard may be nice, if any.

Although the original question was C, I also appreciate C++ answers, if any different.

Upvotes: 38

Views: 15941

Answers (3)

fghj
fghj

Reputation: 9394

I still wonder what was really going on inside the compiler

You can look at assembler, if you are interested in how the compiler interprets code.

10000000000:

400054f:
mov    -0x4(%rbp),%eax
mov    %eax,-0x8(%rbp)
addl   $0x1,-0x4(%rbp)
jmp    40054f <main+0xb>

so it just compiled it into infinite loop, if replace 10000000000 with 10000:

....
test   %al,%al
jne    400551

Upvotes: 2

Matteo Italia
Matteo Italia

Reputation: 126787

As far as C++ is concerned:

C++11, [lex.icon] ¶2

The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented.

And Table 6, for literals without suffixes and decimal constants, gives:

int
long int
long long int

(interestingly, for hexadecimal or octal constants also unsigned types are allowed - but each one come after the corresponding signed one in the list)

So, it's clear that in that case the constant has been interpreted as a long int (or long long int if long int was too 32 bit).

Notice that "too big literals" should result in a compilation error:

A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types.

(ibidem, ¶3)

which is promptly seen in this sample, that reminds us that ideone.com uses 32 bit compilers.


I saw now that the question was about C... well, it's more or less the same:

C99, §6.4.4.1

The type of an integer constant is the first of the corresponding list in which its value can be represented.

list that is the same as in the C++ standard.


Addendum: both C99 and C++11 allow also the literals to be of "extended integer types" (i.e. other implementation-specific integer types) if everything else fails. (C++11, [lex.icon] ¶3; C99, §6.4.4.1 ¶5 after the table)

Upvotes: 39

sarnold
sarnold

Reputation: 104050

From my draft of the C standard labeled ISO/IEC 9899:TC2 Committee Draft — May 6, 2005, the rules are remarkably similar to the C++ rules Matteo found:

5 The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix      Decimal Constant          Octal or Hexadecimal Constant
-------------------------------------------------------------------
none        int                       int
            long int                  unsigned int
            long long int             long int
                                      unsigned long int
                                      long long int
                                      unsigned long long int

u or U      unsigned int              unsigned int
            unsigned long int         unsigned long int
            unsigned long long int    unsigned long long int

l or L      long int                  long int
            long long int             unsigned long int
                                      long long int
                                      unsigned long long int
Both u or U unsigned long int         unsigned long int
and l or L  unsigned long long int    unsigned long long int

ll or LL    long long int             long long int
                                      unsigned long long int

Both u or U unsigned long long int    unsigned long long int
and ll or LL 

Upvotes: 12

Related Questions