mohammad
mohammad

Reputation: 531

Using an array in AWK when working with two files

I have two files I merged them based key using below code

file1
-------------------------------
1      a      t      p      bbb  
2      b      c      f      aaa  
3      d      y      u      bbb  
2      b      c      f      aaa  
2      u      g      t      ccc  
2      b      j      h      ccc

file2
--------------------------------
1   11   bbb  
2   22   ccc  
3   33   aaa  
4   44   aaa  

I merged these two file based key using below code

awk 'NR==FNR{a[$3]=$0;next;}{for(x in a){if(x==$5) print $1,$2,$3,$4,a[x]};  

My question is how I can save $2 of file2 in variable or array and print after a[x] again.
My desired result is :

1 a t p 1   11  bbb  11  
2 b c f 3   33  aaa  33  
2 b c f 4   44  aaa  44  
3 d y u 1   11  bbb  11  
2 b c f 3   33  aaa  33  
2 b c f 4   44  aaa  44  
2 u g t 2   22  ccc  22  
2 b j h 2   22  ccc  22  

As you see the first 7 columns is the result of my merge code. I need add the last column (field 2 of a[x]) to my result.

Important:

My next question is if I have .awk file, how I can use some bash script code like (| column -t) or send result to file (awk... > result.txt)? I always use these codes in command prompt. Can I use them inside my code in .awk file?

Upvotes: 2

Views: 3116

Answers (3)

Steve
Steve

Reputation: 54402

Simply add all of file2 to an array, and use split to hold the bits you want:

awk 'FNR==NR { two[$0]++; next } { for (i in two) { split(i, one); if (one[3] == $NF) print $1,$2,$3,$4, i, one[2] } }' file2 file1

Results:

1 a t p 1   11   bbb   11
2 b c f 3   33   aaa   33
2 b c f 4   44   aaa   44
3 d y u 1   11   bbb   11
2 b c f 3   33   aaa   33
2 b c f 4   44   aaa   44
2 u g t 2   22   ccc   22
2 b j h 2   22   ccc   22

Regarding your last question; you can also add 'pipes' and 'writes' inside of your awk. Here's an example of a pipe to column -t:

Contents of script.awk:

FNR==NR { 
    two[$0]++
    next
}

{
    for (i in two) {
        split(i, one)
        if (one[3] == $NF) { 
            print $1,$2,$3,$4, i, one[2] | "column -t"
        }
    }
}

Run like: awk -f script.awk file2 file1

EDIT:

Add the following to your shell script:

results=$(awk '

    FNR==NR {
        two[$0]++
        next
    }

    {
        for (i in two) {
            split(i, one)
            if (one[3] == $NF) {
                print $1,$2,$3,$4, i, one[2] | "column -t"
            }
        }
    }
' $1 $2)

echo "$results"

Run like:

./script.sh file2.txt file1.txt

Results:

1  a  t  p  1  11  bbb  11
2  b  c  f  3  33  aaa  33
2  b  c  f  4  44  aaa  44
3  d  y  u  1  11  bbb  11
2  b  c  f  3  33  aaa  33
2  b  c  f  4  44  aaa  44
2  u  g  t  2  22  ccc  22
2  b  j  h  2  22  ccc  22

Upvotes: 3

Jonathan Leffler
Jonathan Leffler

Reputation: 753950

Your current script is:

awk 'NR==FNR { a[$3]=$0; next }
             { for (x in a) { if (x==$5) print $1,$2,$3,$4,a[x] } }'

(Actually, the original is missing the second close brace for the second pattern/action pair.)

It seems that you process file2 before you process file1.

You shouldn't need the loop in the second code. And you can make life easier for yourself by using the splitting in the first phase to keep the values you need:

awk 'NR==FNR { c1[$3] = $1; c2[$3] = $2; next }
             { print $1, $2, $3, $4, c1[$5], c2[$5], $5, c2[$5] }'

You can upgrade that to check whether c1[$5] and c2[$5] are defined, presumably skipping the row if they are not.

Given your input files, the output is:

1 a t p 1 11 bbb 11
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22

Give or take column spacing, that's what was requested. Column spacing can be fixed by using printf instead of print, or setting OFS to tab, or ...

The c1 and c2 notations for column 1 and 2 is OK for two columns. If you need more, then you should probably use the 2D array notation:

awk 'NR==FNR { for (i = 1; i <= NF; i++) col[i,$3] = $i; next }
             { print $1, $2, $3, $4, col[1,$5], col[2,$5], $5, col[2,$5] }'

This produces the same output as before.

Upvotes: 3

Birei
Birei

Reputation: 36262

To achieve what you ask, save the second field after the whole line in the processing of your first file, with a[$3]=$0 OFS $2. For your second question, awk has a variable to separate fields in output, it's OFS, assign a tabulator to it and play with it. Your script would be like:

awk '
    BEGIN { OFS = "\t"; } 
    NR==FNR{
        a[$3]=$0 OFS $2;
        next;
    }
    {
        for(x in a){
            if(x==$5) print $1,$2,$3,$4,a[x]
        } 
    }
' file2 file1

That yields:

1       a       t       p       1   11   bbb    11
2       b       c       f       4   44   aaa    44
3       d       y       u       1   11   bbb    11
2       b       c       f       4   44   aaa    44
2       u       g       t       2   22   ccc    22                                                                                                                                                                                           
2       b       j       h       2   22   ccc    22

Upvotes: 2

Related Questions