duqu
duqu

Reputation: 25

Strange result while perform reading 'first and last line' from cat output with 'head' and 'tail'

Bash version 4.4.7.

From tutorial, to get the first and the last lines of a file:

cat txt_file | (head -n1 && tail -n1)

but, for large file (I don't know how large this will work, but file with about thousands lines) this command runs well, but for small file, example:

11111111
22222222
33333333
44444444

The output of command about is only the first line:

11111111

The other command, using awk, works with both files:

awk 'NR==1; END{print}'

Upvotes: 1

Views: 277

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133528

try following too: sed solution:

sed -n '1p;$p' <(seq 1000)

perl solution:

seq 100 |  perl -ne 'print if 1..1 or eof'

bash solution with only tail:

seq 100 | { IFS= read -r line; echo "$line"; tail -1; }

Upvotes: 2

donkopotamus
donkopotamus

Reputation: 23186

Your "question" at the moment is not actually posed as a question, its merely an observation. To explain your observation however. Consider the difference between the output of:

$ seq 10 | (head -1 && tail -1)
1

and

$ seq 1000 | (head -1 && tail -1)
1
1000

What is happening here? Our pipeline is working as follows:

  • send lines (in this case with numbers but its no different to your cat example) to stdout;
  • reading stdout we have:

    • first, a head ... it will print the first line and then end;
    • next, a tail ... it will begin after the head has run and print the last line.

However, by default, head is not reading the file line by line, or even character by character till it finds a line break, instead its reading the file in chunks (a buffered read). That chunk might be 2048 bytes for example.

So our pipeline is really:

  • send lines (in this case with numbers but its no different to your cat example) to stdout;
  • reading stdout we have:

    • first, a head ... it will read the first 2kb from stdin, print the first line and then end;
    • next, a tail ... it will read the remainder of the data after that first 2k because it never sees it.

If your goal is to only generate the output of the first command (your cat) once, then you could use tee, something like this perhaps:

$ seq 10 | tee >(tail -1) | head -2

Also note that on linux, you could alter the buffering of the first command, something like:

$ stdbuf -oL seq 10 | (head -1 && tail -1)

but this won't work if your command fiddles with its streams (see stdbuf)

Upvotes: 4

Related Questions