Nitish
Nitish

Reputation: 159

Code to get lines which have values less or equal than given values in both columns

I have data which has three columns. In the first column I have name, while the second and third columns have one or more semicolon (;) separated values.

Now I want to print the rows where pairs of the semicolon separated column values have distance <= 10 and MAF >= 0.5.

I would be happy if some one provide me R code, if not in R then AWK/SED.

Example

ID          Distance  MAF
cg12044689  8;40      0.000200;0.59
cg12143629  0;1;3     0.000200;0.520;0.0413
cg12247699  42        0.599
cg12375698  1;10      0.00231;0.51

Output should be:

ID          Distance  MAF
cg12143629  0;1;3     0.000200;0.520;0.0413
cg12375698  1;10      0.00231;0.51

Upvotes: 1

Views: 38

Answers (1)

Thor
Thor

Reputation: 47189

Here is an awk script that accomplishes the task by splitting and comparing the pairwise values:

parse.awk

{
  # For each row, split the distance and maf columns into the dist and maf arrays
  n = split($2, dist, ";"); split($3, maf, ";")
  do {
    if (dist[n] <= 10 && maf[n] >= 0.5)
      print
  } while(n-- >= 1)
}

Run it like this:

awk -f ./parse.awk infile

Output:

cg12143629  0;1;3     0.000200;0.520;0.0413
cg12375698  1;10      0.00231;0.51

Upvotes: 1

Related Questions