Mads Obi
Mads Obi

Reputation: 560

Loop over column of file in bash

I have a line of more than 70.000 lines and 11 columns. The 4th column is a position. I want to count each line that has a position that is 100 higher than the position in the above line and a 100 lower than the position in the below line.

I would like to do this without importing the data to R, so I can put into my shell script. I am however not that experienced in shell.

Example of data:

x    y    z    1
x    y    z    80
x    y    z    200
x    y    z    310
x    y    z    390
x    y    z    500
x    y    z    830

I want to count the 3rd and the 6th rows as their value in the 4th column is fulfilling my requirements, so my output here should be "2".

I have tried to search for information on how to do this but have been stuck for some time now.

Upvotes: 1

Views: 279

Answers (2)

anubhava
anubhava

Reputation: 786359

You can use this awk:

awk 'srec && NR==srec+1 && $4>sv+100{count++; sv=srec=0}
     frec && NR==frec+1 && $4>fv+100{fv=frec=0; sv=$4; srec=NR}
     {fv=$4; frec=NR}
     END{print count}' file

2

Upvotes: 1

karakfa
karakfa

Reputation: 67567

awk to the rescue!

$ awk '$4>p+100 && p>pp+100{c++} {pp=p;p=$4} END{print c}' file
2

Explanation

  • $4>p+100 && p>pp+100{c++} if field 4 is greater than prev+100 AND prev is greater than prev prev + 100 increment the counter (started with zero initial value)

  • {pp=p;p=$4} set the new prev prev and prev

  • END{print c} when all rows are done print the counter.

Upvotes: 3

Related Questions