Dan
Dan

Reputation: 623

Way to combine csv row values with the same id in python

I have a CSV file that has list of transactions

For instance,

Year    Name       Amount
2010    John       10
2011    John       10
2012    John       10
2011    John       10

I want it to be grouped by year and then sorted by ID, which has the expected output

Year    Name       Amount
2010    John       10
2011    John       20
2012    John       10

My current code is similar to this one

with open('user.csv', 'r', errors='ignore') as csvFile:
     reader = csv.reader(csvFile)
     for row in sorted(reader):
            output.append([row[0], row[1], row[3]])
            print("Year", row[0], "  Name:" , row[1], "Amount:", row[3])

Thank you

Upvotes: 1

Views: 1241

Answers (2)

Rakesh
Rakesh

Reputation: 82765

Pandas is a good option in this use-case. But if you need to use only the build in modules.

Use:

import csv
from collections import defaultdict

result = defaultdict(int)
with open('user.csv') as csvFile:
    reader = csv.reader(csvFile)     #Note delimiter is `,`
    header = next(reader)            #Get header
    for row in reader:
        result[(int(row[0]), row[1])] += int(row[2])  #key = row & name, value = amount

with open(filename_1, "w", newline='') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerow(header) 
    for key, Amount in sorted(result.items(),  key=lambda x: x[0][0]):  #sorted by year
        writer.writerow([*key, Amount])

Upvotes: 0

Iron Hand Odin
Iron Hand Odin

Reputation: 430

Use pandas:

import pandas as pd
#Read csv
df = pd.read_csv("user.csv")

# Groupby and sum
df_new = df.groupby(["Year", "Name"]).agg({"Amount": "sum"}).sort_values(["Year", "Name"]).reset_index()

df_new

Output:

    Year    Name    Amount
0   2010    John    10
1   2011    John    20
2   2012    John    10

Upvotes: 2

Related Questions