Merlin Bonato
Merlin Bonato

Reputation: 13

How compare too similar files

I have two text files like this:

line are like => SITE.MACHINE.VARIABLE_NAME=VARIABLE_VALUE

CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC13.CHRONO_SANSREPONSE_KEEPALIVE=0
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
...

They are already sort -u

I have to find out which lines are in one file or in another or have been modified (I do not care about the common ones), like sdiff command. But the files are have too similar lines that create the diff error.

I'm thinking of diff on the left side of "=" and, if ok, check for the right side. I am looking for a solution that prints an output like sdiff or kind of.

output wanted exemple :

File1                                                         | File2
CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:0:0"  | CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:1:0"
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=1              | CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=1               | CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=1             | CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=0
CPM-NOMINAL.WAC12.PROTOCOLE_PDD=2                             | CPM-NOMINAL.WAC12.PROTOCOLE_PDD=3
                                                              > CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST="p_initialiser"
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=FALSE                   | CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=TRUE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3204

Thank you.

Upvotes: 1

Views: 77

Answers (2)

karakfa
karakfa

Reputation: 67497

something like this can be done with join

$ join -a1 -a2 -e"---" -t= -o1.1,1.2,2.2,2.1 file1 file2 | column -ts=

CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES         "1:0:1:1:0:0:0:0:0"             "1:0:1:1:0:0:0:1:0"  CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE   1                               0                    CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE    1                               0                    CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE  1                               0                    CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE
CPM-NOMINAL.WAC12.PROTOCOLE_PDD                  2                               3                    CPM-NOMINAL.WAC12.PROTOCOLE_PDD
---                                              ---                             "p_initialiser"      CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE            FALSE                           TRUE                 CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT            3201                            32099                DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT            3201                            3204                 DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT

common values can be eliminated with piping to awk '$2!=$3'

Upvotes: 2

kvantour
kvantour

Reputation: 26481

Here is a possible way of doing this with traditional tools and pipelines. I use the terminology key and value as the file looks like

key = value

The following list of commands give you possible answers:

# lines common between file1 and file2
grep -F -f file1 file2
# lines in file2 not in file1
grep -v -F -f file1 file2
# changed key values from file1 to file2
cut -d'=' -f1 file1 | grep -F -f - <(grep -v -F -f file1 file2)
# keys in file1 but not in file2
cut -d'=' -f1 file1 | grep -v -F -f - file2
# keys in file2 but not in file1
cut -d'=' -f1 file2 | grep -v -F -f - file1

Or you can just go for one simple awk, this is not the most optimised, but gives a clean output:

$ awk '
    BEGIN{FS=" *= *"}
    {key=$1;value=$2}
    (NR==FNR){a[key]=value; next}
    {b[key] = value }
    END {
       for (key in a) if (key in b) {
           print (a[key] == b[key] ? "COMM" : "DIFF"), key,"=",a[key],"<=>",b[key]
           delete a[key]
           delete b[key] 
       }
       for (key in a) {
           print "UNI1", key,"=",a[key]
       }
       for (key in b) {
           print "UNI2", key,"=",b[key]
       }
    }' file1 file2

This will produce some output looking like

 COMM key1 = val1 <=> val1
 COMM key2 = val2 <=> val2
 DIFF key3 = val31 <=> val32      
 COMM key4 = val4 <=> val4
 UNI1 key5 = val5
 UNI2 key6 = val6      

Upvotes: 1

Related Questions