Compare two files using bash script and print detailed diff report

Question

I have 2 large files on Unix system which have thousands of rows and about 80 columns each. I have sorted the files based on group of unique keys so that we compare the same rows always. To ease of understanding I am giving only 3 rows and 7 columns here.

File 1:

d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC123" contract_id="ABC123" src_system_id="PRA" entity_cd="U0525"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC124" contract_id="ABC124" src_system_id="PRA" entity_cd="U0526"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC125" contract_id="ABC125" src_system_id="PRA" entity_cd="U0527"

File2:

d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC123" contract_id="ABC123" src_system_id="PRA" entity_cd="U0525"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC124" contract_id="ABC124" src_system_id="PRB" entity_cd="V0528"    
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC125" contract_id="ABC125" src_system_id="PRA" entity_cd="U0530"

Expected Output:

Mismatch in row 2 : file1.src_system_id=PRA file2.src_system_id=PRB, file1.entity_cd=U0526 file2.entity_cd=V0528 

Mismatch in row 3 : file1.entity_cd=U0527 file2.entity_cd=U0530

Is it possible to achieve this using bash scripting? I tried AWK which isn't giving me the desired output-

paste -d' ' file1 file2| 
  awk -F' ' '{w=NF/2; 
              for(i=1;i<=w;i++) 
                 if($i!=$(i+w)) printf "%d %d %s %s", NR,i,$i,$(i+w); 
              print ""}'

Thanks in Advance !!!

Ed Morton · Accepted Answer

Using any awk in any shell on every Unix box:

$ cat tst.awk
BEGIN { FS="[= ]" }
NR==FNR {
    for (i=1; i



$ awk -f tst.awk file1 file2
Mismatch in row 2 : file1.src_system_id="PRA" file2.src_system_id="PRB", file1.entity_cd="U0526" file2.entity_cd="V0528"

Mismatch in row 3 : file1.entity_cd="U0527" file2.entity_cd="U0530"

The above assumes:

Your quoted strings cannot contain = or blanks
Every tag present in a row of file1 is also present in the same row of file2
The tags are always present in the same order in a given row
You can have multiple duplicate tags in a given row

Compare two files using bash script and print detailed diff report

Answers (2)

Related Questions