Reputation: 826
I have the following dataset from a pairwise analysis (the first row are just the sample ids):
A B A C
1 1 1 0
1 2 1 1
1 0 1 2
I wish to compare the values for field 1
and field 2
then field 3
and field 4
such that I want to print the row number NR
every time I see a 1
and 2
combination for the pairs I am examining.
For example for pairs A
and B
, I would want the output:
A B 2
For pairs A
and C
, I would want the output:
A C 3
I would want to proceed row by row so I would likely need the code to include:
for i in {1..3}; do
awk 'NR=="'${i}'" {code}'
done
But I have no idea how to proceed in a pairwise fashion (i.e. compare field 1
and field 2
and then field 3
and field 4
etc...).
How can I do this?
Upvotes: 0
Views: 196
Reputation: 203532
It's hard to say with such a minimal example but this MAY be what you want:
$ cat tst.awk
FNR==1 {
for (i=1;i<=NF;i++) {
name[i] = $i
}
next
}
{
for (i=1;i<NF;i+=2) {
if ( ($i == 1) && ($(i+1) == 2) ) {
print name[i], name[i+1], NR-1
}
}
}
$ awk -f tst.awk file
A B 2
A C 3
Upvotes: 4
Reputation: 753845
You certainly should only run the script once; there's no need to run awk
more frequently. It isn't yet entirely clear how you want multiple matches printed. However, if you're working a line at time, then the output probably comes a line at a time.
Working on that basis, then:
awk 'NR == 1 { for (i = 1; i < NF; i += 2)
{ cols[(i+1)/2,1] = $i; cols[(i+1)/2,2] = $(i+1); }
next
}
{ for (i = 1; i < NF; i += 2)
{ if ($i == 1 && $(i+1) == 2)
print cols[(i+1)/2,1], cols[(i+1)/2,2], NR - 1
}
}'
The NR == 1
block of code captures the headings so they can be used in the main printing code. There are plenty of other ways to store the information too. The other block of code looks at the data lines and checks that pairs of fields contain 1 2
and print out the control data if there is a match. Because NF will be an even number, but the loops count on the odd numbers, the <
comparison is OK. Often in awk
, you use for (i = 1; i <= NF; i++)
with a single increment and then <=
is required for correct behaviour.
For your minimal data set, this produces:
A B 2
A C 3
For this larger data set:
A B A C
1 1 1 0
1 2 1 1
1 0 1 2
1 2 4 2
5 3 1 9
7 0 3 2
1 2 1 0
9 0 1 2
1 2 3 2
the code produces:
A B 2
A C 3
A B 4
A B 7
A C 8
A B 9
Upvotes: 1