Reputation: 115
I want to use awk to read a csv file. The csv file contains 5 columns, c1, c2, c3, c4, c5. I want judge the c1, c2 and c3 together is unique, like database constraint.
here is sample csv file:
c1,c2,c3,c4,c5
1886,5141,11-2011,62242.57,52.71
1886,5140,11-2011,63763.75,52.22
23157666,4747,11-2011,71.07,83.33
1886,5141,11-2011,4645.45,2135.45
In this case, row1 and row4 violate the unique constraint, and prompt the error message.
How to implement it with awk? Thanks a lot in advance.
Upvotes: 2
Views: 651
Reputation: 359905
This lists all the lines for each duplication. It only outputs the duplication message once for each set.
awk -F, '{count[$1,$2,$3]++; line[$1,$2,$3] = line[$1,$2,$3] ", " NR} END {for (i in count) {if (count[i] > 1) {v=i; gsub(SUBSEP, FS, v); print "Error: lines", substr(line[i], 3), "collide on value:", v}}}'
Broken out on multiple lines:
awk -F, '
{
count[$1,$2,$3]++;
line[$1,$2,$3] = line[$1,$2,$3] ", " NR
}
END {
for (i in count) {
if (count[i] > 1) {
v = i;
gsub(SUBSEP, FS, v);
print "Error: lines", substr(line[i], 3), "collide on value:", v
}
}
}'
This is a variation on Kevin's answer.
Upvotes: 1
Reputation: 56049
awk -F, 'line[$1,$2,$3] {printf "Error: lines %d and %d collide\n", line[$1,$2,$3], NR; next} {line[$1,$2,$3] = NR}'
Upvotes: 5