lcbotta
lcbotta

Reputation: 11

How do I find the average of a column of a csv file in python?

I'm trying to find the average APM (actions per minute) from a list of APMs at different times in a csv file. When I try to do it using this code:

import csv

with open('test_game.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
    x = (row['Total APM'])
    x_sum = sum(x)
    x_length = len(x)
    x_average = x_sum / x_length
    print(x_average)

I get this error:

Traceback (most recent call last):
File "C:/Users/Luke's Laptop/Desktop/magicka_practice.py", line 7, in <module>
x_sum = sum(x)
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Does this mean I have to change the output of the csv row to a list of integers (if so how?) Or is there just something blatantly wrong with my code. I'm very new to this so this might be a stupid question or I might be doing something incredibly obvious. I appreciate any help I can get.

Upvotes: 1

Views: 7383

Answers (1)

James Mills
James Mills

Reputation: 19030

Change this line:

x = (row['Total APM'])

to:

x = int(row['Total APM'])

This converts your string/number into an actual integer that you can do numerical operations with.

Here is what is probably happenning in your code:

>>> x = "1"
>>> sum(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> y = int(x)
>>> sum(y)
1

Python is strongly typed so in general you cannot perform arbitrary operations on differing types that are incompatible. e.g: An int + a str will result in a TypeError.

What you probably want is this:

import csv

with open('test_game.csv') as csvfile:
reader = csv.DictReader(csvfile)
xs = []
for row in reader:
    try:
        x = int(row['Total APM'])
        xs.append(x)
    except ValueError:
        print("Error converting: {0:s}".format(row['Total APM']))
x_average = sum(xs) / len(xs)
print(x_average)

NB: That you want to compute the "average" of the whole column after iterating through all rows. So you want to keep a running total then compute the average outside of the loop.

Update: Alternatively (more in line with your original algorithm) you could do this:

import csv

with open('test_game.csv') as csvfile:
reader = csv.DictReader(csvfile)
x_sum = x_length = 0
for row in reader:
    try:
        x = row['Total APM']
        x_sum = += int(x)
        x_length += 1
    except ValueError:
        print("Error converting: {0:s}".format(x))
x_average = x_sum / x_length
print(x_average)

This keeps a running sum and total length but *NB** that you still have to compute the average outside of the loop unless you want to compute a running average :)

Update #2: As suggested by @Karl -- It is a good idea to catch any errors and handle them appropriately. The "appropriately" is up to you and depends on the use-case :)

Upvotes: 4

Related Questions