Kadir
Kadir

Reputation: 1615

Using Bash array in AWK

I have two files as follows:

file1:

3 1
2 4
2 1

file2:

23
9
7
45

The second field of file1 is used to specify the line of file2 that contains the number to be retrieved and printed. In the desired output, the first field of file1 is printed and then the retrieved field is printed.

Desired output file:

3 23
2 45
2 23

Here is my attempt to solve this problem:

IFS=$'\r\n' baf2=($(cat file2));echo;awk -v av="${baf2[*]}"  'BEGIN {split(av, aaf2, / /)}{print $1, aaf2[$2]}' file1;echo;echo ${baf2[*]}

However, this script cannot use the Bash array baf2.

The solution must be efficient since file1 has billions of lines and file2 has millions of lines in the real case.

Upvotes: 2

Views: 418

Answers (3)

Jotne
Jotne

Reputation: 41456

You can use this awk

awk 'FNR==NR {a[NR]=$1;next} {print $1,a[$2]}' file2 file1
3 23
2 45
2 23

Sorte file2 in array a.
Then print field 1 from file1 and use field 2 to look up in array.

Upvotes: 1

BMW
BMW

Reputation: 45243

Using awk

1) print all lines in file1, whatever if there is match or not

awk 'NR==FNR{a[NR]=$1;next}{print $1,a[$2]}' file2 file1

2) print match lines only

awk 'NR==FNR{a[NR]=$1;next}$2=a[$2]' file2 file1

Upvotes: 1

Mark Setchell
Mark Setchell

Reputation: 207465

This has a similar basis to Jotne's solution, but loads file2 into memory first (since it is smaller than file1):

awk 'FNR==NR{x[FNR]=$0;next}{print $1 FS x[$2]}' file2 file1

Explanation

The FNR==NR part means that the part that follows in curly braces is only executed when reading file2, not file1. As each line of file2 is read, it is saved in array x[] as indexed by the current line number. The part in the second set of curly braces is executed for every line of file1 and it prints the first field on the line followed by the field separator (space) followed by the entry in x[] as indexed by the second field on the line.

Upvotes: 1

Related Questions