Reputation: 159
I have data which has three columns. In the first column I have name
, while the second and third columns have one or more semicolon (;) separated values.
Now I want to print the rows where pairs of the semicolon separated column values have distance <= 10
and MAF >= 0.5
.
I would be happy if some one provide me R code, if not in R then AWK/SED.
Example
ID Distance MAF
cg12044689 8;40 0.000200;0.59
cg12143629 0;1;3 0.000200;0.520;0.0413
cg12247699 42 0.599
cg12375698 1;10 0.00231;0.51
Output should be:
ID Distance MAF
cg12143629 0;1;3 0.000200;0.520;0.0413
cg12375698 1;10 0.00231;0.51
Upvotes: 1
Views: 38
Reputation: 47189
Here is an awk
script that accomplishes the task by splitting and comparing the pairwise values:
parse.awk
{
# For each row, split the distance and maf columns into the dist and maf arrays
n = split($2, dist, ";"); split($3, maf, ";")
do {
if (dist[n] <= 10 && maf[n] >= 0.5)
print
} while(n-- >= 1)
}
Run it like this:
awk -f ./parse.awk infile
Output:
cg12143629 0;1;3 0.000200;0.520;0.0413
cg12375698 1;10 0.00231;0.51
Upvotes: 1