Reputation: 75
I have a file that I've trimmed down to look like the following:
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"
etc.
What I want to end up with is a sum of the total amount per city. That is:
"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"
Etc.
Fair warning, the data set is not necessarily continuous-that is, a city may appear once here, once a thousand lines down, and 3 more times at the end.
I've been trying to use the following awk script:
awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv
The results I'm getting look like this:
"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0
Etc. All zeros on the second column, and no quotes.
I'm obviously missing something, but I don't know what or where else to look. What am I missing?
Thanks.
Upvotes: 3
Views: 466
Reputation: 11047
The reason you failed is because of the double quotes.
Do something like this:
sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\"" i "\"" "," "\"" array[i] "\"" }}'
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
Upvotes: 3
Reputation: 85795
You don't need pre-processing or nasty escaping:
$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
Upvotes: 1
Reputation: 11703
"
is causing problem in your input. First remove them using sed
and print it back using printf
inside awk
Try following:
sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv
Jumbled Input
"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"
Output
"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
Upvotes: 1
Reputation: 195079
This awk one-liner would give exactly what you want with formatting :
awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file
test with your data:
kent$ cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"
kent$ awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"
Upvotes: 2