Reputation: 560
I have a line of more than 70.000 lines and 11 columns. The 4th column is a position. I want to count each line that has a position that is 100 higher than the position in the above line and a 100 lower than the position in the below line.
I would like to do this without importing the data to R, so I can put into my shell script. I am however not that experienced in shell.
Example of data:
x y z 1
x y z 80
x y z 200
x y z 310
x y z 390
x y z 500
x y z 830
I want to count the 3rd and the 6th rows as their value in the 4th column is fulfilling my requirements, so my output here should be "2".
I have tried to search for information on how to do this but have been stuck for some time now.
Upvotes: 1
Views: 279
Reputation: 786359
You can use this awk:
awk 'srec && NR==srec+1 && $4>sv+100{count++; sv=srec=0}
frec && NR==frec+1 && $4>fv+100{fv=frec=0; sv=$4; srec=NR}
{fv=$4; frec=NR}
END{print count}' file
2
Upvotes: 1
Reputation: 67567
awk to the rescue!
$ awk '$4>p+100 && p>pp+100{c++} {pp=p;p=$4} END{print c}' file
2
Explanation
$4>p+100 && p>pp+100{c++}
if field 4 is greater than prev+100 AND prev is greater than prev prev + 100 increment the counter (started with zero initial value)
{pp=p;p=$4}
set the new prev prev and prev
END{print c}
when all rows are done print the counter.
Upvotes: 3