Nithi
Nithi

Reputation: 1

How to Compare two file in unix and merge file 1 and 2 which includes both matching and non matching data

Kindly let me know how to compare 2 files and merge both matching and non matching lines. I have checked all the answers provide previously but nothing suits my requirement.Please find the Sample data sets below

Contents of file1.csv:

J2D     TEXAS        43988
J2D     AUSTIN       21305
J2D     CUPERTINO    378563
J2D     BELGIUM      569632
J2D     UK            0
J2D     US            8
J2D     INDIA         75321

Contents of file2.csv:

J2D     TEXAS        25463
J2D     AUSTIN       5986
J2D     CUPERTINO    0234
J2D     BELGIUM      123468
J2D     UK            5874
J2D     US            2365
J2D     IRAQ          8982

I have tried below command but it is not working in my scenario:

awk 'NR==FNR{a[$2]=$3;next;}{print $0 "    " ($2 in a ? a[$2] : "NA")}' file2.csv file1.csv

Output:

J2D     TEXAS        43988    25463
J2D     AUSTIN       21305    5986
J2D     CUPERTINO    378563    0234
J2D     BELGIUM      569632    123468
J2D     UK            0        5874
J2D     US            8        2365
J2D     INDIA         75321    NA

In the above result, you can see "IRAQ" from file2.csv is missing.

awk 'NR==FNR{a[$2]=$3;next;}{print $0 "    " ($2 in a ? a[$2] : "NA")}' file1.csv file2.csv

Output:

J2D     TEXAS        25463    43988
J2D     AUSTIN       5986    21305
J2D     CUPERTINO    0234    378563
J2D     BELGIUM      123468    569632
J2D     UK            5874    0
J2D     US            2365    8
J2D     IRAQ          8982    NA

In the above result you can see "INDIA" from file1.csv is missing

Below is the expected output. kindly help me to get the desired output

EXPECTED OUTPUT:

J2D     TEXAS      43988      25463
J2D     AUSTIN     21305      5986
J2D     CUPERTINO  378563     0234
J2D     BELGIUM    569632     123468
J2D     UK          0         5874
J2D     US          8         2365
J2D     INDIA       75321     NA
J2D     IRAQ        NA        8982

Upvotes: 0

Views: 360

Answers (2)

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk     '{k=$1 FS $2} 
   NR==FNR {a[k]=$3; next} 
           {print $0, (k in a)?a[k]:"NA"; delete a[k]} 
       END {for(k in a) print k,"NA",a[k]}' file2 file1 | column -t

J2D  TEXAS      43988   25463
J2D  AUSTIN     21305   5986
J2D  CUPERTINO  378563  0234
J2D  BELGIUM    569632  123468
J2D  UK         0       5874
J2D  US         8       2365
J2D  INDIA      75321   NA
J2D  IRAQ       NA      8982

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

paste + awk solution:

paste file1.csv file2.csv | awk '{ if($2==$5) { print $1,$2,$3,$6 }
      else { print $1,$2,$3,"NA","\n",$4,$5,"NA",$6 }}' | column -tx

The output:

J2D  TEXAS      43988   25463
J2D  AUSTIN     21305   5986
J2D  CUPERTINO  378563  0234
J2D  BELGIUM    569632  123468
J2D  UK         0       5874
J2D  US         8       2365
J2D  INDIA      75321   NA
J2D  IRAQ       NA      8982

Details:

  • paste file1.csv file2.csv - merge lines of files

  • if($2==$5) { print $1,$2,$3,$6 } - if files matched by the second column ($5 field points to the early second column of file2.csv)

  • print $1,$2,$3,"NA","\n",$4,$5,"NA",$6 - print unmacthed lines as separate lines with NA in relative positions

http://man7.org/linux/man-pages/man1/paste.1.html

Upvotes: 0

Related Questions