PDP
PDP

Reputation: 143

How does this program duplicate itself?

This code is from Hacker's Delight. It says this is the shortest such program in C and is 64 characters in length, but I don't understand it:

    main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

I tried to compile it. It compiles with 3 warnings and no error.

Upvotes: 13

Views: 1134

Answers (4)

haccks
haccks

Reputation: 106012

This program relies upon the assumptions that

  • return type of main is int
  • function's parameter type is int by default and
  • the argument a="main(a){printf(a,34,a=%c%s%c,34);}" will be evaluated first.

It will invoke undefined behavior. Order of evaluation of arguments of a function is not guaranteed in C.
Albeit, this program works as follows:

The assignment expression a="main(a){printf(a,34,a=%c%s%c,34);}" will assign the string "main(a){printf(a,34,a=%c%s%c,34);}" to a and the value of the assignment expression would be "main(a){printf(a,34,a=%c%s%c,34);}" too as per C standard --C11: 6.5.16

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment [...]

Taking in mind the above semantic of assignment operator the program will be expanded as

 main(a){
      printf("main(a){printf(a,34,a=%c%s%c,34);}",34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);
}  

ASCII 34 is ". Specifiers and its corresponding arguments:

%c ---> 34 
%s ---> "main(a){printf(a,34,a=%c%s%c,34);}" 
%c ---> 34  

A better version would be

main(a){a="main(a){a=%c%s%c;printf(a,34,a,34);}";printf(a,34,a,34);}  

It is 4 character longer but at least follows K&R C.

Upvotes: 8

Barry
Barry

Reputation: 302942

This works based on lots of quirks that C allows you to do, and some undefined behavior that happens to work in your favor. In order:

main(a) { ...

Types are assumed to be int if unspecified, so this is equivalent to:

int main(int a) { ...

Even though main is supposed to take either 0 or 2 arguments, and this is undefined behavior, this can be allowed as just ignoring the missing second argument.

Next, the body, which I will space out. Note that a is an int as per main:

printf(a,
       34,
       a = "main(a){printf(a,34,a=%c%s%c,34);}",
       34);

The order of evaluation of arguments is undefined, but we're relying on the 3rd argument - the assignment - getting evaluated first. We're also relying on the undefined behavior of being able to assign a char * to an int. Also, note that 34 is the ASCII value of ". Thus, the intended impact of the program is:

int main(int a, char** ) {
    printf("main(a){printf(a,34,a=%c%s%c,34);}",
           '"',
           "main(a){printf(a,34,a=%c%s%c,34);}",
           '"');
    return 0; // also left off
}

Which, when evaluated, produces:

main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

which was the original program. Tada!

Upvotes: 4

zneak
zneak

Reputation: 138051

It relies on several quirks of the C language and (what I think is) undefined behavior.

First, it defines the main function. It is legal to declare a function without a return type or parameter types, and they will be presumed to be int. This is why the main(a){ part works.

Then, it calls printf with 4 parameters. Since it has no prototype, it is assumed to return int and accept int parameters (unless your compiler implicitly declares it otherwise, like Clang does).

The first parameter is presumed int and is argc at the beginning of the program. The second parameter is 34 (which is ASCII for the double-quote character). The third parameter is an assignment expression that assigns the format string to a and returns it. It relies on a pointer-to-int conversion, which is legal in C. The last parameter is another quote character in numeric form.

At runtime, the %c format specifiers are substituted with quotes, the %s is substituted with the format string, and you get the original source again.

As far as I know, the order of argument evaluation is undefined. This quine works because the assignment a="main(a){printf(a,34,a=%c%s%c,34);}" is evaluated before a is passed as the first parameter to printf, but as far as I know, there is no rule to enforce it. Additionally, this can't work on 64-bit platforms because the pointer-to-int conversion will truncate the pointer to a 32-bit value. As a matter of fact, even though I can see how it works on some platforms, it doesn't work on my computer with my compiler.

Upvotes: 5

John Bollinger
John Bollinger

Reputation: 180201

The program is supposed to print its own code. Note the similarity of the string literal to the overall program code. The idea is that the literal will be used as the printf() format string because its value is assigned to variable a (albeit in the argument list) and that it will also be passed as the string to print (because an assignment expression evaluates to the value that was assigned). The 34 is the ASCII code for the double quote character ("); using it avoids a format string containing escaped literal quotation mark characters.

The code relies on unspecified behavior in the form of the order of evaluation of the function arguments. If they are evaluated in argument list order then the program is likely to fail because the value of a would then be used as a pointer to the format string before the correct value was actually assigned to it.

Additionally, the type of a defaults to int, and there is no guarantee that int is wide enough to hold an object pointer without truncating it.

Furthermore, the C standard specifies only two permitted signatures for main(), and the signature used is not among them.

Moreover, the type of printf() inferred by the compiler in the absence of a prototype is incorrect. It is by no means guaranteed that the compiler will generate a calling sequence that works for it.

Upvotes: 2

Related Questions