Reputation: 175
I would like to sum amount by a company name, but often format of company name is different.. such as Apple Inc is sometimes Apple computer, Apple Inc. Also.. I don't know how I could handle "header"
My file format is CSV.
company amount
a 20
b 10
A' 30
bb 20
I would like to do like this:
line = readline() if line=='':
break
if 'Apple' in line:
sum(amount)
Upvotes: 0
Views: 424
Reputation: 3924
You're going to need to map the name variations somehow, either by totaling each name separately and combining afterward by hand, or by making a dictionary up front that identifies all the aliases used by each company. if 'Apple' in line:
fails hard because it can undetectably mix the amounts from different companies together.
Company = {"Apple": 1, "Apple Computer": 1, "AAPL": 1, "Apple, Inc": 1,
"Apple Vacations": 2, "Applebee's Restaurant": 3 }
sum[Company[name]] += amount
Edit 2: If you don't know all the company names beforehand, then the best you can do is keep track of the unique names contained in the input file and decide whether to merge them later:
Company = {}
for <name, amount> in file: # pseudo-code for reading and parsing the input
if name in Company:
Company[name] += amount
else:
Company[name] = amount
Upvotes: 0
Reputation: 879421
Your data is not in true CSV format. The apparent columns are not separated by a comma, nor a tab, or even a single space. Sometimes there are multiple spaces... If this were a space-separated-values file, each space would indicate a new column. Multiple spaces would mean you have more than two columns per line.
This detail is important since CSV files are easily parsed by the csv module. But since this is not a true CSV file, we can't use the csv module.
Assuming there are always supposed to be just two columns separated by spaces, and the last column represents a numeric amount (except for the first header line):
total=0
with open('data.csv','r') as f:
next(f) # skip the first (header) line
for line in f:
company,amount=line.rsplit(' ',1)
amount=float(amount)
if 'Apple' in company:
total+=amount
print(total)
Upvotes: 2