Tuyen Quang
Tuyen Quang

Reputation: 69

how to compare 2 csv file base on multiple columns awk?

I have 2 csv file with sample format as below ,, have ~ 5000 rows in each files :

file 1:

    EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110426,Balbon,Susan,Lagat,"abc Equity Ventures, Inc.",Group Internal Audit,Supervisor,I.S. Audit Supervisor,[email protected]
30083,Mendezona,Bingen,Roemer,"abc Equity Ventures, Inc.",Risk Management Office,Vice President,VP - AEV Security,[email protected]
110773,Casas,Joyce Grace,Bea,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Manager,Tax Counsel,[email protected]
286,Fernandez,Mark Brian,Tato,abc Foundation Inc.,Computer Services Division,Supervisor,Senior Applications Supervisor,[email protected]
291,Plando,Marilou,Polleros,"abc Equity Ventures, Inc.",Administration,Assistant Vice President,AVP - Risk Management,[email protected]
110813,Gemelo-Abarca,Therese Xyza,Dableo,"abc Equity Ventures, Inc.",Governance & Compliance Team,Manager,Associate General Counsel - Corporate Secretarial and Compliance,[email protected]
30096,Abay,Joanna Marie,Saluria,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Compliance Officer,[email protected]
110711,Ostan,Margilyn,Salibio,"abc Equity Ventures, Inc.",Accounting,Staff,Senior Accountant 1,[email protected]
110732,Fumar-Gonzales,Vanessa Concepcion,Altarejos,"abc Equity Ventures, Inc.",Legal and Corporate Services,Manager,Associate General Counsel - Labor & Litigation,[email protected]

file 2 :

    EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110426,Balbon,Susan,Lagat,"abc Equity Ventures, Inc.",Group Internal Audit,Supervisor,I.S. Audit Supervisor,[email protected]
30083,Mendezona,Bingen,Roemer,"abc Equity Ventures, Inc.",Security,Vice President,VP - AEV Security,[email protected]
110773,Casas,Joyce Grace,Bea,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Counsel,[email protected]
286,Fernandez,Mark Brian,Tato,abc Foundation Inc.,Computer Services Division,Supervisor,Senior Applications Supervisor,[email protected]
291,Plando,Marilou,Polleros,"abc Equity Ventures, Inc.",Risk Management Office,Assistant Vice President,AVP - Risk Management,[email protected]
110866,Dugan,Belinda,Escultura,"abc Equity Ventures, Inc.",Legal Management,Vice President,Vice President for Legal Services Management,[email protected]
221,Montehermoso,Gladys,Enoy,"abc Equity Ventures, Inc.",Accounting,Staff,Senior Accountant,[email protected]
30102,Oblianda,Anna Cielo,Salud,"abc Equity Ventures, Inc.",Accounting,Supervisor,Accounting Supervisor,[email protected]
110499,Bucol,Charmaine Ann,Rebusa,"abc Equity Ventures, Inc.",Group Internal Audit,Staff,Audit Senior,[email protected]

I would like to file all row have same value in EMPLOYEE_NUMBER+EMAIL_ADDRESS columns but have different values in others columns using awk.

My ideal is merge 2 csv file base on columns EMPLOYEE_NUMBER+EMAIL_ADDRESS on vertical and remove duplicated row using awk. Thanks

Output will like that :

EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110773,Casas,Joyce Grace,Bea,"Aboitiz Equity Ventures, Inc.",Tax Advisory and Compliance,Manager,Tax Counsel,[email protected]
110773,Casas,Joyce Grace,Bea,"Aboitiz Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Counsel,[email protected]

Upvotes: 0

Views: 572

Answers (1)

CWLiu
CWLiu

Reputation: 4043

Use simple awk_script can achieve that,

awk_script:

NR==FNR{
  if(FNR==1){print}
  a[$1 $2]=$0
  next
}
a[$1 $2]!=$0 && a[$1 $2]!=""{
  print a[$1 $2],$0
}

Command to be executed,

awk -F',' -v OFS="\n" -f awk_script file1 file2

Upvotes: 1

Related Questions