Reputation: 339
Currently I'm trying to cast a column into several columns and sum its contents accordingly, i.e. tidying the dataframe in length. For example, we have a column named year
with values from 2014 till 2016. Second, we also have the column sales
with an amount. What I want is to cast year
into 2014
, 2015
& 2016
with the sum of sales
corresponding to that specific year. The original sales
can be dropped or show a total sum of the sales over all years.
Using Pandas groupby() function, agg() and transform() I've tried to come up with a solution, with no prevail first, second. That is, I cannot seem to get a workaround to create the 2014
etc. columns.
Assume the following dataframe:
df = pd.DataFrame({'CustomerId':[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5],
'CustomerName': ['McNulty','McNulty','McNulty',
'Bunk','Bunk','Bunk',
'Joe','Joe','Joe',
'Rawls','Rawls','Rawls',
'Davis','Davis','Davis'],
'Sales':np.random.randint(1000,1500,15),
'Year':[2014,2015,2016,2014,2015,2016,2014,2015,2016,
2014,2015,2016,2014,2015,2016]})
The expected output should be as follows:
CustomerId CustomerName Sales 2014 2015 2016
1 McNulty 3300 1050 1050 1200
2 Bunk 3500 1100 1200 1200
3 Joe 3900 1300 1300 1300
4 Rawls 3500 1000 1000 1500
5 Davis 3800 1600 1100 1100
Upvotes: 2
Views: 189
Reputation: 42946
Using pivot_table
and flattening multiindex columns and finally calculating the sum
over axis=1
:
piv = df.pivot_table(index=['CustomerId', 'CustomerName'], columns='Year').reset_index()
piv.columns = [f'{c1}_{c2}'.strip('_') for c1, c2 in piv.columns]
piv['Sales'] = piv.filter(like='Sales').sum(axis=1)
Output
CustomerId CustomerName Sales_2014 Sales_2015 Sales_2016 Sales
0 1 McNulty 1144 1007 1108 3259
1 2 Bunk 1146 1451 1169 3766
2 3 Joe 1455 1070 1351 3876
3 4 Rawls 1263 1004 1422 3689
4 5 Davis 1428 1431 1399 4258`
Upvotes: 2
Reputation: 18647
You can use DataFrame.pivot_table
:
df.pivot_table(index=['CustomerId', 'CustomerName'],
columns=['Year'],
values='Sales',
margins=True,
margins_name='Sales',
aggfunc='sum').reset_index().iloc[:-1]
[out]
Year CustomerId CustomerName 2014 2015 2016 Sales
0 1 McNulty 1006 1325 1205 3536
1 2 Bunk 1267 1419 1257 3943
2 3 Joe 1348 1217 1323 3888
3 4 Rawls 1091 1390 1330 3811
4 5 Davis 1075 1316 1481 3872
Upvotes: 2