Reputation: 712
I have 2 csv file
file 1 contain
product_id, category_id, price
pid01,cat01,10
pid02,cat01,10
pid03,cat01,20
pid04,cat02,30
pid05,cat02,20
pid06,cat03,30
file 2 contain
category_id, category_name
cat01,Mouse
cat02,Cat
cat03,Fish
cat04,Dog
I need a result like this
product_id, category_id, category_name, price
pid01,cat01,Mouse,10
pid02,cat01,Mouse,10
pid03,cat01,Mouse,20
pid04,cat02,Cat,30
pid05,cat02,Cat,20
pid06,cat03,Fish,30
or
product_id, category_name, price
pid01,Mouse,10
pid02,Mouse,10
pid03,Mouse,20
pid04,Cat,30
pid05,Cat,20
pid06,Fish,30
How do I achive it in Bash or Awk?
Upvotes: 2
Views: 210
Reputation: 1
You can create a shell script (process_csv.sh) like so:
#!/bin/sh
data=`cat file1.csv | sed -n '/pid/,$ p'`
data2=`cat file2.csv`
echo "product_id, category_id, price, category_name" > final.csv
#since category_id is common in both files, we lookup category names based on that id.
for row in $data
do
cat_id=`printf $row | awk -F "," '{print $2'}`
category_name=`printf "$data2" | grep "$cat_id" | cut -f2 -d','`
#now we write category_name to file and append it to row/line with corresponding product_id
echo $row","$categor_name >> final.csv
done
Just run "./process_csv.sh" and final.csv file will contain your result
Upvotes: 0
Reputation: 157947
This awk will do it:
awk -F, 'NR==FNR{a[$1]=$2;next}FNR>1{print $1,$2,a[$2],$3}' OFS=, file2 file1
Btw, you would additionally need to add the headers. Let me explain the script in a multiline format:
# Specify the field delimiter and print the headers
BEGIN {
FS=OFS=","
$1="product_id"
$2="category_id"
$3="category_name"
$4="price"
print
}
# As long as the total number of records (NR) equals
# number of records is equal to the number of records
# in the current input file (FNR) we populate data
# from file2 to the lookup table 'a'
NR==FNR{
a[$1]=$2
next # Skip the following block and go on parsing file2
}
# Skip line 1 in file1, inject column 3 with the value from
# the lookup table and output the record
FNR>1{
print $1,$2,a[$2],$3
}
Please check also anubhava's comment. In gawk
or mawk
the printing of the headers can be achieved more simpler using -F', *'
. The optional space after the comma is because there is a space in your column headers. I would simply remove that space before processing.
Upvotes: 4
Reputation: 88583
With join:
join --header -t , -1 2 -2 1 -o 1.1,1.2,2.2,1.3 file1 file2
Output:
pid01,cat01,Mouse,10 pid02,cat01,Mouse,10 pid03,cat01,Mouse,20 pid04,cat02,Cat,30 pid05,cat02,Cat,20 pid06,cat03,Fish,30
Upvotes: 3