Rish
Rish

Reputation: 826

How to compare pairs of columns in awk?

I have the following dataset from a pairwise analysis (the first row are just the sample ids):

A B A C
1 1 1 0
1 2 1 1
1 0 1 2

I wish to compare the values for field 1 and field 2 then field 3 and field 4 such that I want to print the row number NR every time I see a 1 and 2 combination for the pairs I am examining.

For example for pairs A and B, I would want the output:

A B 2

For pairs A and C, I would want the output:

A C 3

I would want to proceed row by row so I would likely need the code to include:

for i in {1..3}; do
    awk 'NR=="'${i}'" {code}'
done

But I have no idea how to proceed in a pairwise fashion (i.e. compare field 1 and field 2 and then field 3 and field 4 etc...).

How can I do this?

Upvotes: 0

Views: 196

Answers (2)

Ed Morton
Ed Morton

Reputation: 203532

It's hard to say with such a minimal example but this MAY be what you want:

$ cat tst.awk
FNR==1 {
    for (i=1;i<=NF;i++) {
        name[i] = $i
    }
    next
}
{
    for (i=1;i<NF;i+=2) {
        if ( ($i == 1) && ($(i+1) == 2) ) {
            print name[i], name[i+1], NR-1
        }
    }
}

$ awk -f tst.awk file
A B 2
A C 3

Upvotes: 4

Jonathan Leffler
Jonathan Leffler

Reputation: 753845

You certainly should only run the script once; there's no need to run awk more frequently. It isn't yet entirely clear how you want multiple matches printed. However, if you're working a line at time, then the output probably comes a line at a time.

Working on that basis, then:

awk 'NR == 1 { for (i = 1; i < NF; i += 2)
               { cols[(i+1)/2,1] = $i; cols[(i+1)/2,2] = $(i+1); } 
               next
             }
             { for (i = 1; i < NF; i += 2)
               { if ($i == 1 && $(i+1) == 2)
                     print cols[(i+1)/2,1], cols[(i+1)/2,2], NR - 1
               }
             }'

The NR == 1 block of code captures the headings so they can be used in the main printing code. There are plenty of other ways to store the information too. The other block of code looks at the data lines and checks that pairs of fields contain 1 2 and print out the control data if there is a match. Because NF will be an even number, but the loops count on the odd numbers, the < comparison is OK. Often in awk, you use for (i = 1; i <= NF; i++) with a single increment and then <= is required for correct behaviour.

For your minimal data set, this produces:

A B 2
A C 3

For this larger data set:

A B A C
1 1 1 0
1 2 1 1
1 0 1 2
1 2 4 2
5 3 1 9
7 0 3 2
1 2 1 0
9 0 1 2
1 2 3 2

the code produces:

A B 2
A C 3
A B 4
A B 7
A C 8
A B 9

Upvotes: 1

Related Questions