Reputation: 23283
I have a .csv with lots of data, generally laid out like this:
Customer City Month Amount
Wayne E Gotham January 111
Wayne E Gotham January 222
Wayne E Chicago March 392
Wayne E Buffalo June 2928
Clark K Krypton January 100
Clark K Amman February 200
Clark K Detroit February 300
I'm trying to create a summary dataframe, that lists each Customer, then the unique cities they're in, and then sum
the Amount
for that month.
So for the above, I'd like my output to look like:
Customer City January February March April May June ... December
Wayne E Gotham 333
Wayne E Chicago 392
Wayne E Buffalo 2928
Clark K Krypton 100
Clark K Amman 200
Clark K Detroit 200
So far I've been able to get the unique customers and cities, but am struggling on how to populate the month columns. I am not even sure I'm setting up my summary dataframe in the best way, so have hit a wall thinking about it.
Here's what I have thus far:
df = pd.read_csv("mycsv.csv", encoding='cp1252')
customers = df["Customer"].unique()
cities = df["City"].unique()
summary_df = pd.DataFrame(columns=["Assured","Facility", "January","February","March","April","May","June","July","August","September", "October", "November","December"])
Upvotes: 0
Views: 32
Reputation: 323316
Are you looking for pivot
?
df.pivot_table(index=['Customer','City'],columns='Month',values='Amount').reindex(columns=['January','February','March','April', 'May','June']).fillna('').reset_index()
Out[83]:
Month Customer City January February March April May June
0 ClarkK Amman 200
1 ClarkK Detroit 300
2 ClarkK Krypton 100
3 WayneE Buffalo 2928
4 WayneE Chicago 392
5 WayneE Gotham 166.5
Upvotes: 1