user1308144
user1308144

Reputation: 475

using awk to eliminate records that have a match specified by field 1 and within a defined value of field 2

I have a problem that I am trying to use awk to solve. It has application in selecting good quality single nucleaotide ploymorphisms (SNP) for placing on a SNP-chip, where there is a requirement that no SNP is within 60bp of another SNP. The file looks like this:

comp1008_seq1 20
comp1008_seq1 234
comp1008_seq1 260
comp1008_seq1 500
comp3044_seq1 300
comp3044_seq1 350
comp3044_seq1 460
comp3044_seq1 600
................

I want to only print records that are not within +-60 (based on field 2) when they are from the same component (based on field 1). Therefore, it doesn't matter if they are within +-60 when they are from different components (based on field 1). The output in the above example should look like this:

comp1008_seq1 20
comp1008_seq1 234
comp1008_seq1 500
comp3044_seq1 300
comp3044_seq1 460
comp3044_seq1 600

Upvotes: 0

Views: 65

Answers (1)

Amir Gonnen
Amir Gonnen

Reputation: 3727

http://ideone.com/h6oEI

{
        if ($1 != last1 || abs($2-last2) > 60 ) print   
        last1 = $1; last2 = $2
}

function abs(x){
        return x > 0 ? x : -x
}

Upvotes: 3

Related Questions