AdrianP.
AdrianP.

Reputation: 443

Grab the first and last element for every sequence

I have a file with 3 types of sequences and their positions that are reoccurring as such:

seq1 2
seq1 5
seq1 10
seq3 15
seq3 34
seq3 60
seq2 100
seq2 110
seq2 200
seq3 210
seq3 250
seq3 300
seq1 310
seq1 330
seq1 400

The second value is always unique denoting a position and is sorted, hence why the sequences are scattered.

For every time a sequence starts, I want to grab the minimum and max of that sequence. Output should be (seq min max)

seq1 2 10
seq3 15 60
seq2 100 200
seq3 210 300
seq1 310 400

Is it possible to do this in bash with awk or anything else?

Upvotes: 1

Views: 72

Answers (2)

stack0114106
stack0114106

Reputation: 8711

Another awk

$ awk ' { if(NR>1 && p!=$1) { print p,min,max; max=min=""} min=min?min:$2; max=$2; p=$1 } 
END { print p,min,max } ' adrian.txt
seq1 2 10
seq3 15 60
seq2 100 200
seq3 210 300
seq1 310 400

$

Upvotes: 2

anubhava
anubhava

Reputation: 784948

You may use this awk:

awk 'p != $1 {if (NR>1) print p, first, last; first=$2} {p=$1; last=$2} 
END{print p, first, last}' file

seq1 2 10
seq3 15 60
seq2 100 200
seq3 210 300
seq1 310 400

Upvotes: 3

Related Questions