Reputation: 571
I am trying to extract a subset of my data which is tab delimited. I would like to use some information in a column. For example column2 has three scores seperated by ";"
col1 col2
1 a=2;b=1.1;c=0
1 a=0.2;b=0.2;c=0.5
1 a=1.5;b=1.9;c=3.5
I would like to extract the rows whose b value is grater than 1. In this case my desired output will be
col1 col2
1 a=2;b=1.1;c=0
1 a=1.5;b=1.9;c=3.5
I tried to use awk but extracting information within the column did not work. Also, the order is not always the same (a,b,c etc.)It would be best to include 'b > 1' in the search criteria. Any suggestions?
Upvotes: 1
Views: 74
Reputation: 77105
Since the order of Column2 can be random, you can do something like:
awk -F'\t' '
NR>1 {
split($2,ary,/[;=]/);
for (i=1;i<=length(ary);i++) {
if (ary[i]=="b" && ary[i+1]>1) {
print $0
}
}
next
}1' file
$ cat f
col1 col2
1 a=2;b=1.1;c=0
1 a=0.2;b=0.2;c=0.5
1 a=1.5;b=1.9;c=3.5
$ awk -F'\t' '
NR>1 {
split($2,ary,/[;=]/);
for (i=1;i<=length(ary);i++) {
if (ary[i]=="b" && ary[i+1]>1) {
print $0
}
}
next
}1' f
col1 col2
1 a=2;b=1.1;c=0
1 a=1.5;b=1.9;c=3.5
Upvotes: 4