Reputation: 155
I am an extremely novice Python user trying to sum columns of data in a .csv file. I found other answers that really helped me to get started (here and here, for example).
However, my problem is that I want to loop over my file to get sums for all columns.
My formatted data look like this:
z y x w v u
a 0 8 7 6 0 5
b 0 0 5 4 0 3
c 0 2 3 4 0 3
d 0 6 7 8 0 9
Or like this in .csv format:
,z,y,x,w,v,u
a,0,8,7,6,0,5
b,0,0,5,4,0,3
c,0,2,3,4,0,3
d,0,6,7,8,0,9
For right now, I am just trying to get the iteration to work. I will worry about the summing later. Here's my code:
import csv
data = file("test.csv", "r")
headerrow = data.next()
headerrow = headerrow.strip().split(",")
end = len(headerrow)
for i in range (1, end):
for row in csv.reader(data):
print row[i]
Here is what I get:
>>>
0
0
0
0
>>>
So, it prints the values at index 1 for each row, but does not continue through the other indices.
What obvious thing am I missing here?
Update:
Following the very helpful suggestions and explanations, I now have this:
import csv
with open("test.csv") as data:
headerrow = next(data)
delim = "," if "," == headerrow[0] else " "
headerrow = filter(None, headerrow.rstrip().split(delim))
reader = csv.reader(data, delimiter=delim, skipinitialspace=True)
zipped = zip(*reader)
print zipped
strings = next(zipped)
print ([sum(map(int,col)) for col in zipped])
This returns an error:
Traceback (most recent call last):
File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9, in <module>
strings = next(zipped)
TypeError: list object is not an iterator
I do not understand this...? Sorry!
Upvotes: 3
Views: 12780
Reputation: 180411
import csv
with open('in.csv')as f:
head = next(f)
# decide delimiter by what is in header
delim = "," if "," == head[0] else " "
# need to filter empty strings
head = filter(None, head.rstrip().split(delim))
# skipinitialspace must be set as you have two spaces delimited
reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
# transpose rows
zipped = zip(*reader)
# skip first column
strings = next(zipped)
# sum each column
print([sum(map(int,col)) for col in zipped])
[0, 16, 22, 22, 0, 20]
To create a dict matching headers to colunm sums you can so something like:
print(dict(zip(list(head), (sum(map(int,col)) for col in zipped))))
which outputs:
{'u': 20, 'w': 22, 'x': 22, 'z': 0, 'y': 16, 'v': 0}
I used python3 for all of the above, if you are using python2 replace with:
zip -> itertools.izip
filter -> itertools.izip
map -> itertools.imap
Python 2 code:
import csv
from itertools import izip, imap, ifilter
with open('in.csv')as f:
head = next(f)
# decide delimiter by what is in header
delim = "," if "," == head[0] else " "
# need to filter empty strings
head = ifilter(None, head.rstrip().split(delim))
# skipinitialspace must be set as you have two spaces delimited
reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
# transpose rows
zipped = izip(*reader)
# skip first column
strings = next(zipped)
# sum each column
print([sum(imap(int,col)) for col in zipped])
Output:
[0, 16, 22, 22, 0, 20]
If you are doing a lot of this kind of work then pandas especially pandas.read_csv may be useful, below is a very basic example, some pandas guru may hopefully add to it:
import pandas as pd
df = pd.read_csv("in.csv")
print(df.sum())
Unnamed: 0 abcd
z 0
y 16
x 22
w 22
v 0
u 20
dtype: object
Upvotes: 6
Reputation: 20025
You can use numpy
:
import csv
import numpy as np
with open("test.csv") as f:
r = csv.reader(f, delimiter=",")
# For space format: r = csv.reader(f, delimiter=" ", skipinitialspace=True)
# Thanks to Padraic Cunningham ^^
next(r) # Skip header row
sums = sum((np.array(map(int, row[1:])) for row in r))
Result:
>>> sums
array([ 0, 16, 22, 22, 0, 20])
Upvotes: 3
Reputation: 5289
This may clarify some of what exactly is going on... it looks like you're slightly over complicating things. This is very simple Python and not intended to be a direct or final solution to your problem, but more so help understand what is happening.
import csv
sumthree = 0
with open('test.csv', 'rb') as f: # Open the file (always use binary 'rb' mode for CSV files)
header = next(f) # Extract the header line from the file
csvr = csv.reader(f) # Create a CSV object with the rest of the file
for row in csvr:
print row # Now loop over the file and print each row
sumthree += int(row[2])
print sumthree
At this point each row
is going to print as a list, e.g. ['a','0','8','7','6','0','5']
So every iteration through that loop, we're moving down row by row. row[0]
is going to be the first column, row[1]
will be the 2nd, etc. If you want to sum the 3rd column of the file you could use sumthree += int(row[2])
. At the end of this we print sumthree
and see the total of all the numbers in the 3rd column.
Upvotes: 2