Comissar
Comissar

Reputation: 53

Why is read -rd '' required when splitting strings terminated by newlines in bash (3.2)?

The answer to this question on splitting strings by newline characters, Split bash string by newline characters, seems to say that newlines are the default delimiter, so we should change the delimiter to null, and split on that instead. Why doesn't splitting on the newline work? What I would expect (and desire, in my use case) is that there be a 1:1 correlation between lines and \n in the input string (so a \n must be added to get the last line), and that blank lines, leading/imbedded whitespace, etc. would be preserved.

Quoting from Mark Gerolimatos, who seems to be asking the same question:

In OS-X/Macland, you have to use bash 3.2 (or at least without updating BASH). Thus the mysterious read -rd ' ' must be used (and works!) the online manual page I found is pretty cryptic about this (ss64.com/bash/read.html)...it's pretty mind-bending...does it mean "turn off \n, and then use emptiness as the delimiter?"

Upvotes: 0

Views: 1006

Answers (2)

ruakh
ruakh

Reputation: 183211

Just to make sure we're on the same page, this is the code in that answer:

IFS=$'\n' read -rd '' -a y <<<"$x"

where x is the variable to read from and y is the array variable to populate with the lines of x.

Why doesn't splitting on the newline work?

It does; the IFS=$'\n' is telling read to split on newlines.

If you're asking why you can't write read -rd $'\n' -a y, then: the delimiter indicated by -d tells read where to stop reading. So if you set that to a newline, then read will only read one line!

What I would […] desire […] is that […] blank lines […] would be preserved.

Yes, it's annoying that initial or consecutive occurrences of the separator get discarded, such that x=$'\na\n\nb' gives the same result as x=$'a\nb'.

To satisfy your requirements, you'll need to use a slightly different approach, where you call read once per line:

y=()
while IFS= read -r -d $'\n' ; do
  y+=("$REPLY")
done <<< "${x%$'\n'*}"

In this approach, we tell read to just take the line as-is and not split it (hence IFS=), and we handle the looping ourselves.

Note that the "${x%$'\n'*}" bit strips off the last newline and everything after it, per your requirement to ignore the last line if it doesn't have a newline. (The <<< bit implicitly adds a newline.)

Upvotes: 2

that other guy
that other guy

Reputation: 123410

The confusion happens because read operates with two delimiters:

  1. How much to read
  2. How to split what you read

By default, this is:

  1. Read until a linefeed (i.e. one line)
  2. Split on whitespace (i.e. into words)

If you just set IFS=$'\n' you can see the problem:

  1. Read until a linefeed (again, one line)
  2. Split on linefeed (which doesn't do anything, because one line necessarily can't consist of multiple lines)

What you instead want to do is

  1. read all input
  2. split on linfeed

read -d '' causes read to read until an ASCII NUL, which is not found in normal text, and is therefore a workable proxy for "read all text input".

Upvotes: 5

Related Questions