billyduc
billyduc

Reputation: 712

Awk search and append matching name from other csv file

I have 2 csv file

file 1 contain

product_id, category_id, price
pid01,cat01,10
pid02,cat01,10
pid03,cat01,20
pid04,cat02,30
pid05,cat02,20
pid06,cat03,30

file 2 contain

category_id, category_name
cat01,Mouse
cat02,Cat
cat03,Fish
cat04,Dog

I need a result like this

product_id, category_id, category_name, price
pid01,cat01,Mouse,10
pid02,cat01,Mouse,10
pid03,cat01,Mouse,20
pid04,cat02,Cat,30
pid05,cat02,Cat,20
pid06,cat03,Fish,30

or

product_id, category_name, price
pid01,Mouse,10
pid02,Mouse,10
pid03,Mouse,20
pid04,Cat,30
pid05,Cat,20
pid06,Fish,30

How do I achive it in Bash or Awk?

Upvotes: 2

Views: 210

Answers (3)

kundai tinarwo
kundai tinarwo

Reputation: 1

You can create a shell script (process_csv.sh) like so:

#!/bin/sh

data=`cat file1.csv | sed -n '/pid/,$ p'`
data2=`cat file2.csv`
echo "product_id, category_id, price, category_name" > final.csv
#since category_id is common in both files, we lookup category names based on that id.
for row in $data
            do
                    cat_id=`printf $row | awk -F "," '{print $2'}`
                    category_name=`printf "$data2" | grep "$cat_id" | cut -f2 -d','`
                    #now we write category_name to file and append it to row/line with corresponding product_id
                    echo $row","$categor_name >> final.csv


            done

Just run "./process_csv.sh" and final.csv file will contain your result

Upvotes: 0

hek2mgl
hek2mgl

Reputation: 157947

This awk will do it:

awk -F, 'NR==FNR{a[$1]=$2;next}FNR>1{print $1,$2,a[$2],$3}' OFS=, file2 file1

Btw, you would additionally need to add the headers. Let me explain the script in a multiline format:

# Specify the field delimiter and print the headers
BEGIN {
    FS=OFS=","
    $1="product_id"
    $2="category_id"
    $3="category_name"
    $4="price"
    print
}

# As long as the total number of records (NR) equals
# number of records is equal to the number of records
# in the current input file (FNR) we populate data
# from file2 to the lookup table 'a'
NR==FNR{
    a[$1]=$2
    next # Skip the following block and go on parsing file2
}

# Skip line 1 in file1, inject column 3 with the value from
# the lookup table and output the record
FNR>1{
    print $1,$2,a[$2],$3
}

Please check also anubhava's comment. In gawk or mawk the printing of the headers can be achieved more simpler using -F', *'. The optional space after the comma is because there is a space in your column headers. I would simply remove that space before processing.

Upvotes: 4

Cyrus
Cyrus

Reputation: 88583

With join:

join --header -t , -1 2 -2 1 -o 1.1,1.2,2.2,1.3 file1 file2

Output:

pid01,cat01,Mouse,10
pid02,cat01,Mouse,10
pid03,cat01,Mouse,20
pid04,cat02,Cat,30
pid05,cat02,Cat,20
pid06,cat03,Fish,30

Upvotes: 3

Related Questions