Reputation: 1743
I'm just getting into Pandas and trying to generate a spreadsheet for a car lot. I'm loving Pandas but it's slow going and I'm trying to generate some new columns that sum ...
import pandas as pd
data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]})
print data
print data.groupby(['Car', 'Type', 'Trans'])['TOTAL'].sum()
I'm getting the totally predictable ....
Car TOTAL Trans Type
0 Hyundai 2 Auto Accent
1 Hyundai 4 Manual Accent
2 Honda 5 Auto Civic
3 Honda 3 Manual Civic
Car Type Trans
Honda Civic Auto 5
Manual 3
Hyundai Accent Auto 2
Manual 4
Ideally what I'd love to pull off is.....
Car Type Auto Manual Total
Honda Civic 5 3 8
Hyundai Accent 2 4 6
My knowledge isn't that great of Pandas (yet), but I'm guessing it's an "apply" or an agg() function but so far, syntactically, I'm banging my head from the syntax errors, but I appreciate any pointers in the right direction. .. JW
Upvotes: 1
Views: 247
Reputation: 8270
You can prepare two new series ahead of time in the dataframe with auto and manual counts.
data['total_manual'] = data['TOTAL'] * (data['Trans'] == 'Manual').astype(int)
data['total_auto'] = data['TOTAL'] * (data['Trans'] == 'Auto').astype(int)
print data.groupby(['Car', 'Type'])['total_auto', 'total_manual', 'TOTAL'].sum()
Also a similar approach is to use a pivot table with margins.
pvt = pd.pivot_table(data, index=['Car', 'Type'], columns='Trans', values='TOTAL', margins='columns', aggfunc=np.sum)
pvt = pvt.drop(('All',''), axis=0)
Upvotes: 1
Reputation: 42875
To use the built-in pandas
methods, you could: set your 'Car', 'Type', 'Trans'
columns
as index and unstack()
to get the Total
for each subgroup, then just sum over the columns
:
data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]}).set_index(['Car', 'Type', 'Trans'])
total_by_trans = data.unstack().loc[:, 'TOTAL'] # to get rid of the column MultiIndex created by unstack()
total_by_trans['Total'] = total_by_trans.sum(axis=1)
total_by_trans.columns.name = None # just cleaning up
Auto Manual Total
Car Type
Honda Civic 5 3 8
Hyundai Accent 2 4 6
Upvotes: 3