Python CSV and Sum

Question

I would like to sum amount by a company name, but often format of company name is different.. such as Apple Inc is sometimes Apple computer, Apple Inc. Also.. I don't know how I could handle "header"

My file format is CSV.

company amount
a   20
b   10
A'  30
bb  20

I would like to do like this:

line = readline() if line=='':
break
if 'Apple' in line:
sum(amount)

Dave · Accepted Answer

You're going to need to map the name variations somehow, either by totaling each name separately and combining afterward by hand, or by making a dictionary up front that identifies all the aliases used by each company. if 'Apple' in line: fails hard because it can undetectably mix the amounts from different companies together.

Company = {"Apple": 1, "Apple Computer": 1, "AAPL": 1, "Apple, Inc": 1,
           "Apple Vacations": 2, "Applebee's Restaurant": 3 }

sum[Company[name]] += amount

Edit 2: If you don't know all the company names beforehand, then the best you can do is keep track of the unique names contained in the input file and decide whether to merge them later:

Company = {}
for  in file:  # pseudo-code for reading and parsing the input
    if name in Company:
        Company[name] += amount
    else:
        Company[name] = amount

Python CSV and Sum

Answers (2)

Related Questions