Reputation: 623
I have a CSV file that has list of transactions
For instance,
Year Name Amount
2010 John 10
2011 John 10
2012 John 10
2011 John 10
I want it to be grouped by year and then sorted by ID, which has the expected output
Year Name Amount
2010 John 10
2011 John 20
2012 John 10
My current code is similar to this one
with open('user.csv', 'r', errors='ignore') as csvFile:
reader = csv.reader(csvFile)
for row in sorted(reader):
output.append([row[0], row[1], row[3]])
print("Year", row[0], " Name:" , row[1], "Amount:", row[3])
Thank you
Upvotes: 1
Views: 1241
Reputation: 82765
Pandas is a good option in this use-case. But if you need to use only the build in modules.
Use:
import csv
from collections import defaultdict
result = defaultdict(int)
with open('user.csv') as csvFile:
reader = csv.reader(csvFile) #Note delimiter is `,`
header = next(reader) #Get header
for row in reader:
result[(int(row[0]), row[1])] += int(row[2]) #key = row & name, value = amount
with open(filename_1, "w", newline='') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(header)
for key, Amount in sorted(result.items(), key=lambda x: x[0][0]): #sorted by year
writer.writerow([*key, Amount])
Upvotes: 0
Reputation: 430
Use pandas:
import pandas as pd
#Read csv
df = pd.read_csv("user.csv")
# Groupby and sum
df_new = df.groupby(["Year", "Name"]).agg({"Amount": "sum"}).sort_values(["Year", "Name"]).reset_index()
df_new
Output:
Year Name Amount
0 2010 John 10
1 2011 John 20
2 2012 John 10
Upvotes: 2