Gery
Gery

Reputation: 9036

Less rows than expected after comparing two files

I have two files to be compared:

  1. "base" file from where I get values in the second column after comparing it with "temp" file
  2. "temp" file which is continuously changing (e.g., in every loop)

"base" file:

1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i

"temp" file:

2.3
1.8
4.5

For comparison, the following code is used:

awk 'NR==FNR{A[$1]=$2;next} {i=int($1+.01)} i in A {print A[i]}' base temp

Therefore, it outputs:

b
a
d

As noticed, even though there are decimals numbers in "temp" file, the corresponding letters are found and printed. However, I found that with a larger file (e.g., more than a couple of thousands row records in "temp" file) the code always outputs "158" rows less than the actual number of rows in the "temp" file. I do not get why this happens and would like your support to circumvent this.

In the following example, "tmpctd" is the base file and "tmpsf" is the changing file.

awk 'NR==FNR{A[$1]=$2;next} {i=int($1+.01)} i in A {print A[i]}' tmpctd tmpsf

The above comparison produces 22623 rows, but the "tmpsf" (i.e., "temp" file) has 22781 rows. Thus, 158 rows less after comparing both files. For testing please find these files here: https://file.io/pxi24ZtPt0kD and https://file.io/tHgdI3dkbKhr.

Any hints are welcomed.

PS. I updated both links, sorry for that.

Upvotes: 2

Views: 131

Answers (1)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following, written and tested with shown samples in GNU awk.

awk '
FNR==NR{
  a[int($1)]
  next
}
($1 in a){
  print $2
}
' temp_file base_file

Explanation: Adding detailed explanation for above.

awk '                      ##Starting awk program from here.
FNR==NR{                   ##Checking condition if FNR==NR which will be TRUE when temp_file is being read.
  a[int($1)]               ##Creating array a which has index as integer value of 1st field of current line.
  next                     ##next will skip all further statements from here.
}
($1 in a){                 ##Checking condition if first field is present in array a then do following.
  print $2                 ##Printing 2nd field of currnet line.
}
' temp_file base_file      ##Mentioning Input_file names here.

Upvotes: 4

Related Questions