Anna Turkiewicz
Anna Turkiewicz

Reputation: 33

Analysing two files using awk with if condition

I have two files. First contains names, numbers and days for all samples sam_name.csv

Number,Day,Sample
171386,0,38_171386_D0_2-1.raw
171386,0,38_171386_D0_2-2.raw
171386,2,30_171386_D2_1-1.raw
171386,2,30_171386_D2_1-2.raw
171386,-1,40_171386_D-1_1-1.raw
171386,-1,40_171386_D-1_1-2.raw

The second includes information about batches (last column) sam_batch.csv

Number,Day,Quar,Code,M.F,Status,Batch
171386,0,1,x,F,C,1
171386,1,1,x,F,C,2
171386,2,1,x,F,C,5
171386,-1,1,x,F,C,6

I would like to get the information about batches (using two condition number and day) and add it to the first file. I have used awk command to do that, but I am getting results only at one-time point (-1).

Here is my command:

awk -F"," 'NR==FNR{number[$1]=$1;day[$1]=$2;batch[$1]=$7; next}{if($1==number[$1] && $2==day[$1]){print $0 "," number[$1] "," day[$1] "," batch[$1]}}' sam_batch.csv sam_nam.csv 

Here are my results: (a file sam_name, number and day from file sam_batch (just to check if a condition is working) and batch number (a value which I need)

Number,Day,Sample,Number,Day, Batch
171386,-1,40_171386_D-1_1-1.raw,171386,-1,6
171386,-1,40_171386_D-1_1-2.raw,171386,-1,6
175618,-1,08_175618_D-1_1-1.raw,175618,-1,2

Upvotes: 3

Views: 130

Answers (1)

Andriy Makukha
Andriy Makukha

Reputation: 8304

Here I corrected your AWK code:

awk -F"," 'NR==FNR{
    number_day = $1 FS $2 
    batch[number_day]=$7 
    next
}
{
    number_day = $1 FS $2
    print $0 "," batch[number_day]
}' sam_batch.csv sam_name.csv 

Output:

Number,Day,Sample,Batch
171386,0,38_171386_D0_2-1.raw,1
171386,0,38_171386_D0_2-2.raw,1
171386,2,30_171386_D2_1-1.raw,5
171386,2,30_171386_D2_1-2.raw,5
171386,-1,40_171386_D-1_1-1.raw,6
171386,-1,40_171386_D-1_1-2.raw,6

(No need for double-checking if you understand how the script works.)


Here's another AWK solution (my original answer):

awk -v "b=sam_batch.csv" 'BEGIN {
    FS=OFS=","
    while(( getline line < b) > 0) {
        n = split(line,a)
        nd = a[1] FS a[2]
        nd2b[nd] = a[n]
    }
}
{ print $1,$2,$3,nd2b[$1 FS $2] }' sam_name.csv 

Both solutions parse file sam_batch.csv at the beginning to form a dictionary of (number, day) -> batch. Then they parse sam_name.csv, printing out the first three fields together with the "Batch" from another file.

Upvotes: 1

Related Questions