swa_slam
swa_slam

Reputation: 27

compare two files

I have two files which are different row :

file 1:

31.32   29.15   46.77   106.40  11370
25.81   40.82   25.67   30.08   16365
27.11   42.32   14.48   50.04   18310.7
26.48   42.34   12.65   62.78   19607.5
24.48   46.00   17.16   11.86   22087.2
26.75   43.91   29.65   55.81   24032.7
30.91   34.85   15.25   50.93   26703
25.24   41.62   16.54   51.57   38032.9
23.48   41.97   17.33   50.88   48981.2
24.16   39.34   16.99   50.86   77513.4
22.90   41.59   19.76   50.31   135803
19.98   43.52   20.58   45.65   747049
19.96   43.64   20.43   45.37   809913
19.93   43.75   20.41   45.33   863931

and file 2:

12.4   -32.1    39.1    -44.9   135497.688
8.6    -38.6    39.3    -44.8   48981.191
1.0    -45.0    0.0     -54.0   45928.445
13.9   -70.1    39.4    -44.8   26702.982

I would like to compare these two files and the output :

file 3

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191

The problem is the respective columns value in the two files are not exactly matched. But It will be fine if they match within certain error bounds (e.g., +/- 1).


Annotating where values in file 3 come from, using F/R/C for File/Row/Column:

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
2/4/1 2/4/2   1/7/1  1/7/2   2/4/3  2/4/4   1/7/3   1/7/4   2/4/5

8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191
2/2/1 2/2/2   1/9/1  1/9/2   2/2/3  2/2/4   1/9/3   1/9/4   2/2/5

But:

Upvotes: 1

Views: 595

Answers (3)

clt60
clt60

Reputation: 63892

This:

(LC_ALL=C; join -1 5 -2 5 \
    <(<file1 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)\
    <(<file2 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)
) | awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'

will produce for your input this:

13.9 -70.1 30.91 34.85 39.4 -44.8 15.25 50.93 26703
8.6 -38.6 23.48 41.97 39.3 -44.8 17.33 50.88 48981

The last column is rounded.

more compact form:

cmd() {
    awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5
}
(LC_ALL=C; join -1 5 -2 5 <(<file1 cmd) <(<file2 cmd)) |\
awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'

Upvotes: 4

ripat
ripat

Reputation: 3236

Only with awk.

awk '
NR==FNR {a[int($5+0.5)] = $0; next}
a[int($5+0.5)] {$0 = a[int($5+0.5)] " " $0; print $6,$7,$1,$2,$8,$9,$10}' file1 file2

If you need it to be sorted, pipe the output into sort

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246754

awk '
    function close_enough(v1, v2, delta) {
        delta = v1 - v2
        return (-1 <= delta && delta <= 1)
    }
    NR == FNR {
        key[$NF] = $0
        next
    }
    {
        for (val in key) {
            if (close_enough($NF,val)) {
                split(key[val], arr)
                print arr[1], arr[2], $1, $2, arr[3], arr[4], $3, $4, val
            }
        }
    }
' file2 file1 | column -t > file3

Upvotes: 2

Related Questions