How compare too similar files

Question

I have two text files like this:

line are like => SITE.MACHINE.VARIABLE_NAME=VARIABLE_VALUE

CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC13.CHRONO_SANSREPONSE_KEEPALIVE=0
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
...

They are already sort -u

I have to find out which lines are in one file or in another or have been modified (I do not care about the common ones), like sdiff command. But the files are have too similar lines that create the diff error.

I'm thinking of diff on the left side of "=" and, if ok, check for the right side. I am looking for a solution that prints an output like sdiff or kind of.

output wanted exemple :

File1                                                         | File2
CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:0:0"  | CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:1:0"
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=1              | CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=1               | CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=1             | CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=0
CPM-NOMINAL.WAC12.PROTOCOLE_PDD=2                             | CPM-NOMINAL.WAC12.PROTOCOLE_PDD=3
                                                              > CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST="p_initialiser"
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=FALSE                   | CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=TRUE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3204

Thank you.

kvantour · Accepted Answer

Here is a possible way of doing this with traditional tools and pipelines. I use the terminology key and value as the file looks like

key = value

The following list of commands give you possible answers:

# lines common between file1 and file2
grep -F -f file1 file2
# lines in file2 not in file1
grep -v -F -f file1 file2
# changed key values from file1 to file2
cut -d'=' -f1 file1 | grep -F -f - <(grep -v -F -f file1 file2)
# keys in file1 but not in file2
cut -d'=' -f1 file1 | grep -v -F -f - file2
# keys in file2 but not in file1
cut -d'=' -f1 file2 | grep -v -F -f - file1

Or you can just go for one simple awk, this is not the most optimised, but gives a clean output:

$ awk '
    BEGIN{FS=" *= *"}
    {key=$1;value=$2}
    (NR==FNR){a[key]=value; next}
    {b[key] = value }
    END {
       for (key in a) if (key in b) {
           print (a[key] == b[key] ? "COMM" : "DIFF"), key,"=",a[key],"<=>",b[key]
           delete a[key]
           delete b[key] 
       }
       for (key in a) {
           print "UNI1", key,"=",a[key]
       }
       for (key in b) {
           print "UNI2", key,"=",b[key]
       }
    }' file1 file2

This will produce some output looking like

 COMM key1 = val1 <=> val1
 COMM key2 = val2 <=> val2
 DIFF key3 = val31 <=> val32      
 COMM key4 = val4 <=> val4
 UNI1 key5 = val5
 UNI2 key6 = val6

How compare too similar files

Answers (2)

Related Questions