Reputation: 9036
I have two files to be compared:
"base" file:
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
"temp" file:
2.3
1.8
4.5
For comparison, the following code is used:
awk 'NR==FNR{A[$1]=$2;next} {i=int($1+.01)} i in A {print A[i]}' base temp
Therefore, it outputs:
b
a
d
As noticed, even though there are decimals numbers in "temp" file, the corresponding letters are found and printed. However, I found that with a larger file (e.g., more than a couple of thousands row records in "temp" file) the code always outputs "158" rows less than the actual number of rows in the "temp" file. I do not get why this happens and would like your support to circumvent this.
In the following example, "tmpctd" is the base file and "tmpsf" is the changing file.
awk 'NR==FNR{A[$1]=$2;next} {i=int($1+.01)} i in A {print A[i]}' tmpctd tmpsf
The above comparison produces 22623 rows, but the "tmpsf" (i.e., "temp" file) has 22781 rows. Thus, 158 rows less after comparing both files. For testing please find these files here: https://file.io/pxi24ZtPt0kD and https://file.io/tHgdI3dkbKhr.
Any hints are welcomed.
PS. I updated both links, sorry for that.
Upvotes: 2
Views: 131
Reputation: 133428
Could you please try following, written and tested with shown samples in GNU awk
.
awk '
FNR==NR{
a[int($1)]
next
}
($1 in a){
print $2
}
' temp_file base_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when temp_file is being read.
a[int($1)] ##Creating array a which has index as integer value of 1st field of current line.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if first field is present in array a then do following.
print $2 ##Printing 2nd field of currnet line.
}
' temp_file base_file ##Mentioning Input_file names here.
Upvotes: 4