Reputation: 550
I have 2 csv files which have the following structure:
File 1:
date,keyword,location,page
2019-04-11,ABC,mumbai,http://www.insurers.com
and so on.
File 2:
date,site,market,location,url
2019-05-12,denmark,de ,Frankfurt,http://lufthansa.com
2019-04-11,Netherlands,nl,amsterdam,http://www.insurers.com
The problem is I need to match the dates in both the files as well as the the url. Example:
2019-04-11 and http://www.insurers.com (File 1)
with
2019-04-11 and http://www.insurers.com (File 2)
Edit:
If this condition is satisfied the keyword (ABC
) in File 1 should be inserted into the File 2 as the third column(new column).
Expected Output:
date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com
I have tried putting the dates and urls in a map in java, but there are too many URLs duplicated. So I am seeking a bash, awk, grep or sed solution. Thanks.
Upvotes: 1
Views: 111
Reputation: 203617
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { m[$1,(NR>1?$4:"url")]=$2; next }
($1,$5) in m { $2=$2 OFS m[$1,$5]; print }
' file1 file2
date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com
Upvotes: 2
Reputation:
try gnu sed:
sed -En 's!^([0-9]{4}-[0-9]+-[0-9]+,).+(http://\w.+)!s#^\1([^,]+),[^,]+,\\s*\2#\\1#p!p' File2| sed -Enf - File1 >Result
Upvotes: 0