roddy
roddy

Reputation: 41

Creating a file in awk using the production of a multiplication from two files

I am hoping for some help in awk to create a file from the product of two input files.

File 1 has 850,000 rows and 50,001 columns of SNP data. The first column is the id

Example of 3 rows in File 1 with id and first 4 SNPs

A 1 2 1 2   
B 2 2 2 1  
C 1 1 1 1  

File 2 has 1 row of 50,000 SNP effects.

0.2 -0.1 0.4 0.5 

My desired output is the id and the sum of each SNP times the SNP effect i.e.

A would be 1*0.2 + 2*-0.1 + 1*0.4 + 2*0.5 = 1.4

A 1.4
B 1.5
C 1  

Any help would be appreciated.

Roddy

Upvotes: 1

Views: 125

Answers (2)

hek2mgl
hek2mgl

Reputation: 158100

You can use the following awk script:

awk 'FNR==NR{split($0,a);next}{t=0;for(i=2;i<=NF;i++){t+=$i*a[i-1]};print $1,t}' b.txt a.txt

Better readable as a multiline version:

calc.awk

# True for the first input file (the one with the factors)
# See: https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#Auto_002dset
FNR==NR{
    # split factors into array a  
    split($0,a)
    next
}
{
    t=0 # total
    # Iterate through fields
    for(i=2;i<=NF;i++){
        # ... and aggregate t 
        t+=$i*a[i-1]
    }
    print $1,t # Output the id along with t
}

Call it like this:

awk -f calc.awk b.txt a.txt

Upvotes: 3

Kent
Kent

Reputation: 195209

This awk one-liner should work for you:

 awk 'NR==FNR{split($0,a);next}{s=0;for(i=2;i<=NF;i++)s+=a[i-1]*$i;print $1,s}' file2 file1

Upvotes: 3

Related Questions