justaguy
justaguy

Reputation: 3022

awk runs but resulting output is empty

The awk below runs, however the output file is 0 bytes. It is basically matching input files that are 21 - 259 records to a file of 11,137,660 records. Basically, what it does is use the input files of which there are 4 to search and match in a large 11,000,000 record file and output the average of all the $7 in the matches. I can not seem to figure out why the file is empty. Thank you :).

input

AGRN
CCDC39 
CCDC40 
CFTR

search

chr1    955543  955763  chr1:955543 AGRN-6|gc=75    1   0
chr1    955543  955763  chr1:955543 AGRN-6|gc=75    2   2
chr1    955543  955763  chr1:955543 AGRN-6|gc=75    3   2

expected output

chr1:955543 AGRN|gc=75 1.3

awk

 awk '
 NR == FNR {input[$0]; next}
 {
    split($5, a, "-")
    if (a[1] in input) {
         key = $4 OFS $5
         n[key]++
         sum[key] += $7
     }
 }
 END {
     for (key in n) 
         printf "%s %.1f\n", key, sum[key]/n[key]
 }
' search.txt input.txt > output.txt

Upvotes: 1

Views: 359

Answers (1)

Craig Estey
Craig Estey

Reputation: 33601

Because the search file comes first in ARGV, you can't do the data matchup until END [as input will be empty].

Here's what I think will work. Based upon your test files, it produces a single line of output:

chr1:955543 AGRN-6|gc=75 0.7

Here is the script file, invoked with awk -f script.awk search.txt input.txt:

BEGIN {
    slen = 0;
}

# get input file(s)
# NOTE: IMO, this is a cleaner better test condition
ARGIND > 1 {
    ###printf("input_push: DEBUG %s\n",$0);
    input[$0];
    next;
}

# get single search list
{
    ###printf("search_push: DEBUG %s\n",$0);
    search[slen++] = $0;
    next;
}

END {
    # sum up data
    for (sidx = 0;  sidx < slen;  ++sidx) {
        sval = search[sidx];
        ###printf("search_end: DEBUG %s\n",sval);

        split(sval,sary)
        split(sary[5],a,"-");
        ###printf("search_end: DEBUG sary[5]='%s' a[1]='%s'\n",sary[5],a[1]);

        if (a[1] in input) {
            key = sary[4] OFS sary[5]
            n[key]++
            sum[key] += sary[7]
        }
    }

    for (key in n)
        printf "%s %.1f\n", key, sum[key]/n[key]
}

Upvotes: 2

Related Questions