ani0710
ani0710

Reputation: 266

merge two csv files according to matching rows and add new columns in linux

I am developing an application using java, but for that I need a csv file in some order. I dont know linux much, but wondering if some way is there to merge the csv files in the required format.

I have two csv files containing hundreds of thousands of records. a sample is below:

name,Direction,Date
abc,sent,Jan 21 2014 02:06 
xyz,sent,Nov 21 2014 01:09
pqr,sent,Oct 21 2014 03:06  

and

name,Direction,Date
abc,received,Jan 22 2014 02:06
xyz,received,Nov 22 2014 02:06

so, this second csv file would contain some records of file 1. What I need is a new csv like this:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

Need to add the columns(4th and 5th column) according to the matching data in column1. if no matching data is there in the second file, the columns should be empty like above.

so is there a bash command in linux to achieve this?

Upvotes: 4

Views: 5672

Answers (2)

jftuga
jftuga

Reputation: 1963

You can use this join command to accomplish this. The first file is 1.csv and the second file is 2.csv

join -1 1 -2 1 -t, -a 1  1.csv 2.csv | sed "s/Direction,Date/currentDirection,receivedDate/2"

Output:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06 

Explanation:

You want to join on the first field in both files, therefore -1 1 -2 1

You want to use comma, therefore -t,

You want to display all unmatched records in file 1, therefore -a 1, you could also append -a 2 if needed.

The /2 in the sed command tells sed to replace the 2nd occurrence

Upvotes: 3

Kent
Kent

Reputation: 195039

awk may work for you:

kent$  awk -F, -v OFS="," 
       'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$0;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,abc,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,xyz,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

update

kent$  awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$2 FS $3;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv 
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

Upvotes: 4

Related Questions