user47589
user47589

Reputation:

Need assistance understanding C code about newlines

This question references Reflections on Trusting Trust, figure 2.

Take a look at this snippet of code, from figure 2:

...
c = next( );
if(c != '\\')
    return(c);
c = next( );
if (c != '\\')
    return('\\');
if (c == 'n')
    return('\n');

It says:

This is an amazing piece of code. It "knows" in a completely portable way what character code is compiled for a new line in any character set. The act of knowing then allows it to recompile itself, thus perpetuating the knowledge.

I would like to read the rest of the paper. Can someone explain how the above code is recompiling itself? I'm not sure I understand how this snippet of code relates to the code in "Stage 1":

Stage 1
(source: bell-labs.com)

Upvotes: 2

Views: 169

Answers (3)

Yu Hao
Yu Hao

Reputation: 122383

What this piece of code does is to translate escape characters, which is part of the job of a C compiler.

c = next( );
if(c != '\\')
    return(c);

Here, if c is not \\(the character \), means it's not the start of an escape character, so return itself.

If it is, then it's the start of an escape character.

c = next( );
if (c == '\\')
    return('\\');
if (c == 'n')
    return('\n');

Here you have a typo in your question, it's if (c == '\\'), not if (c != '\\'). This piece of code continue to examine the character following \, it's clear, if it's \, then the whole escape character is \\, so return it. The same for \n.

Upvotes: 2

rici
rici

Reputation: 241691

The description of that code, from Ken Thompson's paper is: (emphasis added)

Figure 2 is an idealization of the code in the C compiler that interprets the character escape sequence.

So you're looking at part of a C compiler. The C compiler is written in C, so it will be used to compile itself (or, more accurately, the next version of itself). Hence the statement that the code is able to "recompile itself".

Upvotes: 1

Daniel Williams
Daniel Williams

Reputation: 8885

The stage 2 example is very interesting because it is an extra level of indirection with a self replicating program.

What he means is that since this compiler code is written in C it is completely portable because it detects the presence of a literal \n and returns the character code for \n without ever knowing what that actual character code is since the compiler was written in C and compiled for the system.

The paper goes on to show you very interesting trojan horse with the compiler. If you use this same technique to make the compiler insert a bug into any program, then remove move the bug from the source code, the compiler will compile the bug into the supposedly bug free compiler.

It is a bit confusing but essentially it is about multiple levels of indirection.

Upvotes: 4

Related Questions