learningcurve
learningcurve

Reputation: 19

Sum specific columns in a CSV file

I have a csv file with 100 columns. I want to calculate the sum for column 4 to n. I could generate the sum for a single column but when I try it for all columns I fail. Here is what I have so far

import decimal
import numpy as np
import os as os
import csv as csv
import re as re
import sys

col=10
values=[]
with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    headers = reader.next()
    for line in reader:
    #print line
        line = [int(i) for i in line]
    col_totals = [sum(result) for result in zip(*line)]
    print col_totals
       #values.append(int(line[col]))
       #csum=sum(values)
    #print csum  

Thanks,

Upvotes: 0

Views: 3382

Answers (2)

gboffi
gboffi

Reputation: 25023

If you want to sum across contiguous lines, this will do

i, j = 3, 5

with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    headers = reader.next()
    table = list(reader)
    sums = [sum(float(elt) for elt in col) for col in zip(*table)[i:j]]

try also the following

requested = [4, 7, 12, 13, 21, 81]

with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    headers = reader.next()
    table = list(reader)
    sums = [sum(float(elt) for elt in col) for i, col in enumerate(zip(*table)) if i in requested]

Upvotes: 0

acushner
acushner

Reputation: 9946

this is very, very easy in pandas:

import pandas as pd
df = pd.read_csv(filename)
df[df.columns[4:]].sum()

and if you want a per-line sum of the columns, it's this:

df[df.columns[4:]].sum(1)

Upvotes: 1

Related Questions