Reputation: 73
I have the following files:
file1.csv
ID, Name, Address, Phone, Favorite Color/Colors Match
1, Jim, 12 Main,123-456-7890, Blue
2, Kim,11 Bush,987-654-3210, RedGreen
3, Tim,33 Main,111-111-1111,YellowOrange
4, Zim,66 Main,111-555-1111,YellowOrange
file2.csv
ID, Name, Address, Phone, Favorite Color/Colors Match, Blood Type, Left/Right Handed
1, Jon, 122 Main,333-456-7890, Red, A Pos, Right
2, Tom,111 Bush,999-654-3210, RedGreen, A Neg, Right
3, Tam,333 Main,111-222-1111,YellowOrange, O Neg, Left
4, Zam,99 Main,555-555-1111,Blue, A Pos, Left
I would like the following output for each file:
file1.csv
ID, Name, Address, Phone, Favorite Color/Colors Match
1, Jim, 12 Main,123-456-7890, No Match
2, Kim,11 Bush,987-654-3210, Match
3, Tim,33 Main,111-111-1111,Match
4, Zim,66 Main,111-555-1111,No Match
file2.csv
ID, Name, Address, Phone, Favorite Color/Colors Match, Blood Type, Left/Right Handed
1, Jon, 122 Main,333-456-7890, No Match, A Pos, Right
2, Tom,111 Bush,999-654-3210, Match, A Neg, Right
3, Tam,333 Main,111-222-1111,Match, O Neg, Left
4, Zam,99 Main,555-555-1111,No Match, A Pos, Left
Basically, I want to compare one field based on the ID field in two separate files and determine they match or not.
I tried doing this using awk:
awk -F',' 'NR==FNR {n[$5]=$0} {if ($1 == n[$1] && $5 == n[$5])
n[$5]="Match"; else $n[5]="No Match";}1' OFS=, file1.csv file2.csv > testfile.csv
But I was left with some unexpected results. I am new to the power of awk so any recommendations would be appreciated. Thank you.
Upvotes: 1
Views: 137
Reputation: 784998
You can use awk like this:
cat csv.awk
BEGIN{FS=OFS=","}
FNR==NR {
a[$1 SUBSEP $5]
next
}
{
$5 = (($1 SUBSEP $5) in a)?"Match":"No Match"
}
1
Then use it as:
awk -f csv.awk file2.csv file1.csv
ID, Name, Address, Phone,Match
1, Jim, 12 Main,123-456-7890,No Match
2, Kim,11 Bush,987-654-3210,Match
3, Tim,33 Main,111-111-1111,Match
4, Zim,66 Main,111-555-1111,No Match
and also:
awk -f csv.awk file1.csv file2.csv
ID, Name, Address, Phone,Match, Blood Type, Left/Right Handed
1, Jon, 122 Main,333-456-7890,No Match, A Pos, Right
2, Tom,111 Bush,999-654-3210,Match, A Neg, Right
3, Tam,333 Main,111-222-1111,Match, O Neg, Left
4, Zam,99 Main,555-555-1111,No Match, A Pos, Left
Update: If you want to do it a single awk command then here it is.
cat csv.awk
BEGIN{FS=OFS=","}
{
key = $1 SUBSEP $5
}
FNR == NR {
a[key]
next
}
FILENAME == ARGV[2] {
if (key in a) {
$5 = "Match"
b[key]
}
else
$5 = "No Match"
print > "_" ARGV[2]
next
}
{
$5 = (key in b)?"Match":"No Match"
print > "_" ARGV[3]
}
Then execute it as:
awk -f csv.awk file2.csv file1.csv file2.csv &&
mv _file1.csv file1.csv &&
mv _filw2.csv file2.csv
Upvotes: 1