user976856
user976856

Reputation: 175

Python CSV and Sum

I would like to sum amount by a company name, but often format of company name is different.. such as Apple Inc is sometimes Apple computer, Apple Inc. Also.. I don't know how I could handle "header"

My file format is CSV.

company amount
a   20
b   10
A'  30
bb  20

I would like to do like this:

line = readline() if line=='':
break
if 'Apple' in line:
sum(amount)

Upvotes: 0

Views: 424

Answers (2)

Dave
Dave

Reputation: 3924

You're going to need to map the name variations somehow, either by totaling each name separately and combining afterward by hand, or by making a dictionary up front that identifies all the aliases used by each company. if 'Apple' in line: fails hard because it can undetectably mix the amounts from different companies together.

Company = {"Apple": 1, "Apple Computer": 1, "AAPL": 1, "Apple, Inc": 1,
           "Apple Vacations": 2, "Applebee's Restaurant": 3 }

sum[Company[name]] += amount

Edit 2: If you don't know all the company names beforehand, then the best you can do is keep track of the unique names contained in the input file and decide whether to merge them later:

Company = {}
for <name, amount> in file:  # pseudo-code for reading and parsing the input
    if name in Company:
        Company[name] += amount
    else:
        Company[name] = amount

Upvotes: 0

unutbu
unutbu

Reputation: 879421

Your data is not in true CSV format. The apparent columns are not separated by a comma, nor a tab, or even a single space. Sometimes there are multiple spaces... If this were a space-separated-values file, each space would indicate a new column. Multiple spaces would mean you have more than two columns per line.

This detail is important since CSV files are easily parsed by the csv module. But since this is not a true CSV file, we can't use the csv module.

Assuming there are always supposed to be just two columns separated by spaces, and the last column represents a numeric amount (except for the first header line):

total=0
with open('data.csv','r') as f:
    next(f)  # skip the first (header) line 
    for line in f:
        company,amount=line.rsplit(' ',1)
        amount=float(amount)
        if 'Apple' in company:
            total+=amount
print(total)

Upvotes: 2

Related Questions