Mallik Kumar
Mallik Kumar

Reputation: 550

Match 2 columns in 2 files and get another value from the first file

I have 2 csv files which have the following structure:

File 1:
date,keyword,location,page
2019-04-11,ABC,mumbai,http://www.insurers.com
and so on.

File 2:
date,site,market,location,url 
2019-05-12,denmark,de ,Frankfurt,http://lufthansa.com
2019-04-11,Netherlands,nl,amsterdam,http://www.insurers.com

The problem is I need to match the dates in both the files as well as the the url. Example:

2019-04-11 and http://www.insurers.com (File 1)
with 
2019-04-11 and http://www.insurers.com (File 2)

Edit: If this condition is satisfied the keyword (ABC) in File 1 should be inserted into the File 2 as the third column(new column).

Expected Output:

date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com

I have tried putting the dates and urls in a map in java, but there are too many URLs duplicated. So I am seeking a bash, awk, grep or sed solution. Thanks.

Upvotes: 1

Views: 111

Answers (2)

Ed Morton
Ed Morton

Reputation: 203617

$ awk '
    BEGIN { FS=OFS="," }
    NR==FNR { m[$1,(NR>1?$4:"url")]=$2; next }
    ($1,$5) in m { $2=$2 OFS m[$1,$5]; print }
' file1 file2
date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com

Upvotes: 2

user7712945
user7712945

Reputation:

try gnu sed:

sed -En 's!^([0-9]{4}-[0-9]+-[0-9]+,).+(http://\w.+)!s#^\1([^,]+),[^,]+,\\s*\2#\\1#p!p' File2| sed -Enf - File1 >Result

Upvotes: 0

Related Questions