pasaba por aqui
pasaba por aqui

Reputation: 3549

read -N and IFS

According to "read -N" description in manual page:

-N nchars return only after reading exactly NCHARS characters, unless EOF is encountered or read times out, ignoring any delimiter

However, in answer to following command:

$ echo 'a b' | while read -N1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>><<<
>>>b<<<
>>><<<

both, space and newline have been translated into empty string, while in the command:

$ echo 'a b' | while IFS= read -N1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>> <<<
>>>b<<<
>>>
<<<

space and newline have been stored correctly in the variable.

So, it seems delimiters still has some processing in "read" or "while" command, that I do not understand.

We could compare these results with the ones using "read -n", that manual described as:

-n nchars return after reading NCHARS characters rather than waiting for a newline, but honor a delimiter if fewer than NCHARS characters are read before the delimiter

$ echo 'a b' | while read -n1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>><<<
>>>b<<<
>>><<<

$ echo 'a b' | while IFS= read -n1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>> <<<
>>>b<<<
>>><<<

Upvotes: 5

Views: 913

Answers (4)

Charles Stewart
Charles Stewart

Reputation: 11837

Using hexdump allows us to see exactly the characters making up the output, so it may be helpful to slightly change your queries:

(1) With normal IFS and using -N option

$ (echo 'a b' | while read -N1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 3c 62 3c 3c                                 |a<<b<<|
00000006 

In this first case, the read builtin for both 0x0a and the space character returns the empty string, as characters are in the default IFS and characters in the IFS are ignored in the output for the reason explained in cdarke's answer.

(2) With empty IFS and -N option

$ (IFS=""; echo 'a b' | while read -N1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 20 3c 62 3c 0a 3c                              |a< <b<.<|
00000008

In this case, the read builtin will match each of the four characters that the echo command outputs, and both 0x0a and a space are seen in the output, because with an empty IFS the characters read can be assigned to the local variable c.

(3) With normal IFS and -n option

$ (echo 'a b' | while read -n1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 3c 62 3c 3c                                 |a<<b<<|
00000006 

This gives just the same output as case (1), although the semantics are a bit different: the read builtin for both 0x0a and the space character return the empty string, as (i) both of these characters are in the default IFS and (ii) the -n option to the read builtin in any case does not pass on the trailing 0x0a character

(4) With empty IFS and -n option

$ (IFS=""; echo 'a b' | while read -n1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 20 3c 62 3c 3c                              |a< <b<<|
00000007

Here we observe a difference between the -n and -N options to read: with the -n option, the newline is treated specially by the read builtin and dropped, hence the exclusion of 0x0a from IFS doesn't have an opportunity to allow it to be passed to the local variable c.

Upvotes: 2

chepner
chepner

Reputation: 532518

read cannot decide if a character is a delimiter (to ignore it) until it has already read the character, and read must assign some value to c, even if that value is the empty string. When a delimiter is read and subsequently discarded, the value of c must be set to something, so it is assigned the empty string.

This is consistent with read used without the -n/-N options; delimiters are only discarded after they are read and if they aren't necessary to set the value of the provided parameter(s). The simplest case is when you don't provide any arguments to read:

$ read <<< " a b c "
$ echo ">>>$REPLY<<<"
>>> a b c <<<

With a single explicit argument, leading and trailing delimiters are stripped:

$ read line <<< " a b c "
$ echo ">>>$line<<<"
>>>a b c<<<

With two arguments, the first delimiter is ignored once it has been read. The second is retained, because the string only needs to be split into two words to fill the provided parameters.

$ read field1 field2 <<< " a b c """
$ echo ">>>$field1<<<"
>>>a<<<
$ echo ">>>$field2<<<"
>>>b c<<<

Upvotes: 1

cdarke
cdarke

Reputation: 44444

This is POSIX behaviour. When assigning to a variable, IFS characters should be stripped: the results shall be split into fields as in the shell for the results of parameter expansion (of course, -n and -N are not POSIX).

This is born-out by the read source code comments:

/* This code implements the Posix.2 spec for splitting the words
     read and assigning them to variables. */
  orig_input_string = input_string;

  /* Remove IFS white space at the beginning of the input string.  If
     $IFS is null, no field splitting is performed. */

Upvotes: 4

masoud
masoud

Reputation: 56549

In my opinion, while using option -N, the behavior of read is different when

  • Reading a delimiter as input
  • Assigning that delimiter to a variable

When it's reading a character, a delimiter treats as same as a non-delimiter and read will count them. But, when read is assigning the delimiter, it considers that if the read input is a delimiter or not, if it's a delimiter it assigns a null to the corresponding variable.

So, IFS= will change the behavior of assigning a white-space to a variable and causes a space to be assigned to c rather than a null.

Upvotes: 3

Related Questions