alex
alex

Reputation: 75

sum multiple lines in csv when fields match within file

I have a file that I've trimmed down to look like the following:

"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

etc.

What I want to end up with is a sum of the total amount per city. That is:

"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"

Etc.

Fair warning, the data set is not necessarily continuous-that is, a city may appear once here, once a thousand lines down, and 3 more times at the end.

I've been trying to use the following awk script:

awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv

The results I'm getting look like this:

"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0

Etc. All zeros on the second column, and no quotes.

I'm obviously missing something, but I don't know what or where else to look. What am I missing?

Thanks.

Upvotes: 3

Views: 466

Answers (4)

CS Pei
CS Pei

Reputation: 11047

The reason you failed is because of the double quotes.

Do something like this:

sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\""  i "\""  ","  "\"" array[i] "\"" }}' 

"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"

Upvotes: 3

Chris Seymour
Chris Seymour

Reputation: 85795

You don't need pre-processing or nasty escaping:

$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"

Upvotes: 1

jkshah
jkshah

Reputation: 11703

" is causing problem in your input. First remove them using sed and print it back using printf inside awk

Try following:

sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv

Jumbled Input

"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"

Output

"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"

Upvotes: 1

Kent
Kent

Reputation: 195079

This awk one-liner would give exactly what you want with formatting :

awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file

test with your data:

kent$  cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

kent$  awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"

Upvotes: 2

Related Questions