Tiago Bruno
Tiago Bruno

Reputation: 413

How can I use awk to find consecutive patterns in lines?

I'am trying to create an awk script that is capable of count the number consecutive patterns based on the 3th field and that the first and last coordinate field (2th field) was printed as in the example.

I have a script that can count the number of patterns in any coordinate window I want, for example: 1000000 centering the data at the middle:

awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000]++} END{for(k in a) print k,a[k]}' file

However it is counting the number of all patterns regardless of being 1/1 or 0/1.

17 38172452 1/1
17 38172942 1/1
17 38172973 1/1  
17 38173143 0/1
17 38176256 0/1
17 38176476 1/1
17 38178149 0/1
17 38178627 0/1
17 38179275 0/1
17 38179290 0/1
17 38179492 0/1
17 38179667 1/1
17 38182229 0/1
17 38183090 0/1
17 38183505 0/1
17 38188419 0/1
17 38188844 0/1
17 38189049 0/1

Expected result:

17 38172452 38172973 3 1/1
17 38173143 38176256 2 0/1
17 38178149 38179492 5 0/1
17 38182229 38189049 6 0/1

Can you guys help me out with this?

Upvotes: 0

Views: 133

Answers (1)

karakfa
karakfa

Reputation: 67467

assuming $1 is not changing...

awk '{if(p==$3) {c++; e=$2}
      else {if(c>1) print $1,b,e,p,c; 
            b=$2; c=1; p=$3}}
 END {print $1,b,$2,p,c}' file

Upvotes: 1

Related Questions