chowpay
chowpay

Reputation: 1687

python iterate over csv with blank lines

I have a csv that looks like this

file.csv

"File is","NameofFileA"
"randomdata","1" <-- size of file
"randomdata","32"
"randomdata","43"
 <---->[this is a blank line found in the file]
"File is","NameofFileB"
"randomdata","4" 
"randomdata","3"
"randomdata","1"

So what I want to do is end up with a list like this

NameofFileA Total = 73
NameofFileB Total = 8
...
..
.

I worked out how to get a total from column 2 (row[1])the last column but it doesnt sort it by NameofFileX:

with open(csvInput,"r") as inputFile, open(csvOutput ,"w") as outputFile:
    data = csv.reader(inputFile, delimiter=',', quotechar='"')
    total = 0
    headerline = inputFile.next()
    for row in data:
        print ', '.join(row)
        total += int(row[1])
    print total

Question: How do I say in a pythonic way say "add up all the items for NameofFileA" ?

The file name always proceeds the cell with "File is" and that "header" row always comes after a blank line like the above.

Im not sure how to work out a blank line then tell it to store row[1] of blank as a file name... then skip line and total everything in row[1] but stop at blank.

Thanks

Upvotes: 0

Views: 746

Answers (1)

niemmi
niemmi

Reputation: 17263

In order to get the numbers for a file you could leverage on itertools.takewhile to return you rows from CSV until a blank line has been found. Then just sum the numbers and read the next file from CSV:

import csv
from itertools import takewhile

res = []
with open('file.csv') as in_f:
    reader = csv.reader(in_f, delimiter=',', quotechar='"')

    # Read next name from CSV
    for _, name in reader:
        # Read rows and return numbers until blank line is found
        total = sum(int(row[1]) for row in takewhile(bool, reader))
        res.append((name, total))

print res

Output:

[('NameofFileA', 76), ('NameofFileB', 8)]

In above bool is used as a predicate to let takewhile know if row should be returned. Since CSV reader will return empty row as [] and empty list is False in boolean context takewhile will stop there.

Then for each row returned by takewhile generator expression takes value from second column and converts it to int. Finally these numbers are summed to get the total.

Upvotes: 2

Related Questions