tlrrd
tlrrd

Reputation: 322

Why does taking stdin from a file differ from receiving it over a pipe?

Using bash I often want to get the headings of a large csv file and search the rest for a particular entry. I do this as follows.

$ (head -1; grep mike) < tmp.csv
name,age,favourite colour
mike,38,blue

But taking the input from cat, or any other command doesn't work - it seems grep never gets passed the remainder of the file.

$ cat tmp.csv | (head -1; grep mike)
name,age,favourite colour

Why is there different behaviour in these two cases?

Upvotes: 8

Views: 449

Answers (3)

evil otto
evil otto

Reputation: 10582

The difference between reading from a pipe and reading from a file is that you can lseek on a file, but not on a pipe.

The behavior here looks (as seen through strace) like it's coming from head, not bash. head will read a buffer and find the appropriate number of lines, then lseek backwards to the point where the last-output line ended, leaving the file handle open at that place. As above, this works if it's reading a file, but not if it's reading from a pipe.

I can't think of any case other than what you're doing where this behavior in head makes sense, but there it is. Learn something new every day, I tell ya ...

Upvotes: 7

choroba
choroba

Reputation: 242343

Very strange. You should not rely on this undocumented behaviour, use something like this instead:

sed -n '1p;/mike/p' tmp.csv

Upvotes: 3

Rob Napier
Rob Napier

Reputation: 299633

I can't reproduce this reliably with bash 3.2.48. Either both succeed or both fail. But the underlying reason for the failures is how large the file is.

cat reads one buffer (4k-64k depending on the system) and hands it down the pipe. head consumes the entire buffer and then exits. grep then has access to the file after the buffer size. On my system, I can use your pipe only to grep thing further than one buffer into the file (so I can grep things at the end of a long file, but not at the beginning after using head).

It is possible that later versions of bash optimize the < operator (but not cat) to allow your trick work, but I don't believe this is supported behavior.

Upvotes: 0

Related Questions