Sam
Sam

Reputation: 2331

Print first few and last few lines of file through a pipe with "..." in the middle

Problem Description

This is my file

1
2
3
4
5
6
7
8
9
10

I would like to send the cat output of this file through a pipe and receive this

% cat file | some_command
1
2
...
9
10

Attempted solutions

Here are some solutions I've tried, with their output

% cat temp | (head -n2 && echo '...' && tail -n2)
1
2
...
% cat temp | tee >(head -n3) >(tail -n3) >/dev/null
1
2
3
8
9
10
# I don't know how to get the ...
% cat temp | sed -e 1b -e '$!d'
1
10

% cat temp | awk 'NR==1;END{print}'
1
10
# Can only get 2 lines

Upvotes: 16

Views: 1213

Answers (4)

markp-fuso
markp-fuso

Reputation: 34144

Assumptions:

  • as OP has stated, a solution must be able to work with a stream from a pipe
  • the total number of lines coming from the stream is unknown
  • if the total number of lines is less than the sum of the head/tail offsets then we'll print duplicate lines (we can add more logic if OP updates the question with more details on how to address this situation)

A single-pass awk solution that implements a queue in awk to keep track of the most recent N lines; the queue allows us to limit awk's memory usage to just N lines (as opposed to loading the entire input stream into memory, which could be problematic when processing a large volume of lines/data on a machine with limited available memory):

h=2 t=3

cat temp | awk -v head=${h} -v tail=${t} '
    { if (NR <= head) print $0
      lines[NR % tail] = $0
    }

END { print "..."

      if (NR < tail) i=0
      else           i=NR

      do { i=(i+1)%tail
           print lines[i]
         } while (i != (NR % tail) )
    }'

This generates:

1
2
...
8
9
10

Demonstrating the overlap issue:

$ cat temp4
1
2
3
4

With h=3;t=3 the proposed awk code generates:

$ cat temp4 | awk -v head=${h} -v tail=${t} '...'
1
2
3
...
2
3
4

Whether or not this is the 'correct' output will depend on OP's requirements.

Upvotes: 0

dawg
dawg

Reputation: 103764

An awk:

awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file

Or if a single pass is important (and memory allows), you can use perl:

perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", @F[0..$head-1],("..."),@F[-$tail..-1]);}' file   

Or, an awk that is one pass:

awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
    print "..."
    for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file

Or, nothing wrong with being a caveman direct like:

head -2 file; echo "..."; tail -2 file

Any of these prints:

1
2
...
9
10

It terms of efficiency, here are some stats.

For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.

I then created a 1.1 GB file with seq 99999999 >file

  • The two pass awk: 50 secs
  • One pass perl: 10 seconds
  • One pass awk: 29 seconds
  • 'Caveman': 2 MS

Upvotes: 8

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10123

Two single pass sed solutions:

sed '1,2b
     3c\
...
     N
     $!D'

and

sed '1,2b
     3c\
...
     $!{h;d;}
     H;g'

Upvotes: 1

anubhava
anubhava

Reputation: 785008

You may consider this awk solution:

awk -v top=2 -v bot=2 'FNR == NR {++n; next} FNR <= top || FNR > n-top; FNR == top+1 {print "..."}' file{,}

1
2
...
9
10

Upvotes: 1

Related Questions