nonopolarity
nonopolarity

Reputation: 151036

Is the backslash n `\n` really Line Feed, but is treated as a Newline?

For all the years using C, Ruby, Perl, Java, and PHP, or many other languages, I just took \n to mean newline. So turns out it is only "Line Feed" but is treated as "Newline"?

It looks like the following are some facts, some of them from the Wikipedia article:

  1. \n is Line Feed, not Newline, but is treated by almost all systems to mean "a new line".
  2. In theory, \r\n is really what a newline is, because it moves the cursor back to the first position horizontally, and move the cursor down one row. However, although some systems use this as the Newline (Windows), some systems just treat a single \n to mean Newline (Unix, Linux, Mac).
  3. There were some systems that actually had a Newline character, such as on IBM mainframe and ZX80, but it never became a character in ASCII, and we almost never use it.
  4. So in ASCII, we really still only have Carriage Return \r which is 0x0d (Decimal 13), and Line Feed \n, which is 0x0a (Decimal 10).
  5. And as far as programming is concerned, outputting the 0x0a character is meant to be a Newline, although in reality it is only a Line Feed.
  6. Theoretically, if we output 0x0a, one can argue that the next output will continue at the same horizontal position as the previous line, instead of at the left most position, but in practice, it is not. It is always the left most position on most systems.
  7. The \r is still being used by some program to keep outputting information on the same line and not scroll up. This works although we never know how many "blank spaces" to print out to cover up the old information. It can be done by the ANSI escape sequence \033[K to clear from the cursor to the end of line, or more correctly using some curses library that works on that particular platform.

I think two main take away points are

  1. We really don't have a newline character, but we take the \n, which is Line Feed, to mean Newline, and most system just treat this Line Feed to "mean" Newline.
  2. It is really 0x0a, just to set the record straight. I thought for many years it is 0x0d but it is not.

Are the above accurate, or any correction or addition?

Upvotes: 2

Views: 6331

Answers (4)

hackerb9
hackerb9

Reputation: 1912

Is '\n' really Line Feed? No.

Linefeed is a specific character in ASCII (and thus Unicode) at code point 0x0A. It can be represented in C as '\012'.

Newline is a behavior on the output terminal: move the cursor to the beginning of the next line. In ASCII terms, it is a combination of the behaviors of Linefeed and Carriage Return, but has no assigned character. It is represented in C as '\n'.

While it may seem that sending a Linefeed from a programming language always results in a Newline, that is an illusion due to two factors:

  • Microsoft's C library emits a Carriage Return before each Linefeed when writing to stdout or "text" files.
  • UNIX tty devices default to intercepting any Linefeed being sent to a terminal and prepending a Carriage Return.

That is the answer to the original question, but if you want to know why, keep reading.


Looking behind the illusion

Under UNIX, one can disable post-processing of the output sent to a terminal by running stty -opost. For example:

$ stty -opost
$ printf "foo\nbar\n"
foo
   bar
      $ stty sane
$ printf "foo\nbar\n"
foo
bar

To do the same from within a C program one would typically use cfmakeraw() or tcsetattr() in UNIX. Under Microsoft Windows, one could use fopen("foo","wb") or _setmode().

The C standard and implementations

The C23 standard states

§ 5.2.2 Character display semantics

The active position is that location on a display device where the next character output by the fputc function would appear. [...] Alphabetic escape sequences representing nongraphic characters in the execution character set are intended to produce actions on display devices as follows:

  • \a (alert) Produces an audible or visible alert without changing the active position.
  • \b (backspace) Moves the active position to the previous position on the current line. If the active position is at the initial position of a line, the behavior of the display device is unspecified.
  • \f (form feed) Moves the active position to the initial position at the start of the next logical page.
  • \n (new line) Moves the active position to the initial position of the next line.
  • \r (carriage return) Moves the active position to the initial position of the current line.
  • \t (horizontal tab) Moves the active position to the next horizontal tabulation position on the current line. If the active position is at or past the last defined horizontal tabulation position, the behavior of the display device is unspecified.
  • \v (vertical tab) Moves the active position to the initial position of the next vertical tabulation position. If the active position is at or past the last defined vertical tabulation position, the behavior of the display device is unspecified.

Each of these escape sequences shall produce a unique implementation-defined value which can be stored in a single char object. The external representations in a text file need not be identical to the internal representations, and are outside the scope of this document.

Note that '\n' must be a single char in memory. What that character is and how it is converted into the Newline behavior is up to the implementation.

For most OSes this is not a problem as they have UNIX underpinnings (GNU/Linux, BSD, MacOS, Chrome OS, etc.). Under UNIX, Newline is Linefeed, or to speak more precisely, the Newline behavior is represented by the Linefeed character. UNIX programmers don't need to worry about the difference; '\n' always sends a Linefeed character and it is only transformed by the driver that talks to terminals. This makes sense since only a terminal can perform the Newline behavior.

Microsoft went a different route. Newline is represented by Carriage Return and Linefeed in Windows. It is up to every program and file to always use both. Microsoft's C, like UNIX, uses Linefeed in memory for '\n'. However, instead of having the terminal driver do the conversion, Microsoft's C library defaults to inserting a Carriage Return before every Linefeed that is output. This occurs not just for '\n', but also '\012', 0x0A, and 0b1010. This is why Windows programmers must differentiate between "text" and "binary" files.


Footnote: NEL, a minor curiosity

Historically, newline has been represented by other single byte characters. In particular, NEL, 0x85, is defined to mean "move the cursor to the beginning of the next line, scrolling the screen if necessary", which would be perfect. Unfortunately, it is an "8-bit control", not ASCII, so it is not compatible with UTF-8 Unicode. NEL can still be sent using a 7-bit sequence, Esc E, but that takes two bytes and does not meet the C standard's requirement of fitting in a single char.

Upvotes: -1

check
check

Reputation: 11

I meant to write this as a comment but I can't.

While reading the Build Your Own Text Editor tutorial, I came across this in the Fix Ctrl-M section:

It turns out that the terminal is helpfully translating any carriage returns (13, '\r') inputted by the user into newlines (10, '\n').

Later it talks about output processing:

It turns out that the terminal does a similar translation on the output side. It translates each newline ("\n") we print into a carriage return followed by a newline ("\r\n").

If you look at the man for termios(3) it says:

ICRNL Translate carriage return to newline on input (unless IGNCR is set).

OPOST Enable implementation-defined output processing.

Upvotes: 1

Joop Eggen
Joop Eggen

Reputation: 109557

Nowadays there are the following line endings ("newlines"):

  • Old MacOS 0x0D = 13 = CR = \r = carriage return
  • Linux 0x0A = 10 = LF = \n = linefeed
  • AS400 0x85 = EBCDIC NL = NEL
  • Windows 0x0D 0x0A = CR-LF = \r\n

CR, LF stem from the mechanical type writer with a paper holding carriage. CR caused by a handle on the right end of the carriage caused it the carriage to be released back to the left. LF rolled the carriage two half lines up.

A language like Java took the strategy to read the lines without passing the newline character(s). It can deal with any line endings. For the current platform it provides a property holding the actual newline (line separator).

For pattern matching there exists regular expression \R (Java "\\R") to match any newline sequence.

Upvotes: 3

EvilTeach
EvilTeach

Reputation: 28837

New Line is an abstract name for a text file line terminator. On Win machines, it is implemented as a 0x0d0a. On Unix machines it is implemented as a 0x0a. On old Macs it is implemented as an 0x0d.

Those implementation values are all ASCII characters. They are inherited from teletypes. 0x0d actually caused the carriage to move so the next character printed is in column 1. 0x0a actually caused the carriage to rotate one line.

I used to see this on a Model 33 Teletype :)

Another place terms can get get confusing is the ASCII NUL character. It's value is 0x00. You sometimes see it in code as '\0'. A lot of people refer to it as NULL, which is a pointer value in C/C++ whose value is normally 0.

Upvotes: 2

Related Questions