Reputation: 413
I'am trying to create an awk script that is capable of count the number consecutive patterns based on the 3th field and that the first and last coordinate field (2th field) was printed as in the example.
I have a script that can count the number of patterns in any coordinate window I want, for example: 1000000 centering the data at the middle:
awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000]++} END{for(k in a) print k,a[k]}' file
However it is counting the number of all patterns regardless of being 1/1 or 0/1.
17 38172452 1/1
17 38172942 1/1
17 38172973 1/1
17 38173143 0/1
17 38176256 0/1
17 38176476 1/1
17 38178149 0/1
17 38178627 0/1
17 38179275 0/1
17 38179290 0/1
17 38179492 0/1
17 38179667 1/1
17 38182229 0/1
17 38183090 0/1
17 38183505 0/1
17 38188419 0/1
17 38188844 0/1
17 38189049 0/1
Expected result:
17 38172452 38172973 3 1/1
17 38173143 38176256 2 0/1
17 38178149 38179492 5 0/1
17 38182229 38189049 6 0/1
Can you guys help me out with this?
Upvotes: 0
Views: 133
Reputation: 67467
assuming $1
is not changing...
awk '{if(p==$3) {c++; e=$2}
else {if(c>1) print $1,b,e,p,c;
b=$2; c=1; p=$3}}
END {print $1,b,$2,p,c}' file
Upvotes: 1