Reputation: 1615
I have two files as follows:
file1
:
3 1
2 4
2 1
file2
:
23
9
7
45
The second field of file1
is used to specify the line of file2
that contains the number to be retrieved and printed. In the desired output, the first field of file1
is printed and then the retrieved field is printed.
Desired output file:
3 23
2 45
2 23
Here is my attempt to solve this problem:
IFS=$'\r\n' baf2=($(cat file2));echo;awk -v av="${baf2[*]}" 'BEGIN {split(av, aaf2, / /)}{print $1, aaf2[$2]}' file1;echo;echo ${baf2[*]}
However, this script cannot use the Bash array baf2
.
The solution must be efficient since file1
has billions of lines and file2
has millions of lines in the real case.
Upvotes: 2
Views: 418
Reputation: 41456
You can use this awk
awk 'FNR==NR {a[NR]=$1;next} {print $1,a[$2]}' file2 file1
3 23
2 45
2 23
Sorte file2 in array a.
Then print field 1
from file1 and use field 2
to look up in array.
Upvotes: 1
Reputation: 45243
Using awk
1) print all lines in file1, whatever if there is match or not
awk 'NR==FNR{a[NR]=$1;next}{print $1,a[$2]}' file2 file1
2) print match lines only
awk 'NR==FNR{a[NR]=$1;next}$2=a[$2]' file2 file1
Upvotes: 1
Reputation: 207465
This has a similar basis to Jotne's solution, but loads file2 into memory first (since it is smaller than file1):
awk 'FNR==NR{x[FNR]=$0;next}{print $1 FS x[$2]}' file2 file1
Explanation
The FNR==NR part means that the part that follows in curly braces is only executed when reading file2, not file1. As each line of file2 is read, it is saved in array x[] as indexed by the current line number. The part in the second set of curly braces is executed for every line of file1 and it prints the first field on the line followed by the field separator (space) followed by the entry in x[] as indexed by the second field on the line.
Upvotes: 1