Reputation: 21
I want to reshape my dataframe so I can pivot the 'kind' field, but I also want to include per-row aggregations.
df = pd.DataFrame([
{
'date': '2022-04-20',
'kind': 'alpha',
'scalar_a': 2,
'scalar_b': 5
},
{
'date': '2022-04-20',
'kind': 'bravo',
'scalar_a': 3,
'scalar_b': 7
},
{
'date': '2022-04-21',
'kind': 'charlie',
'scalar_a': 4,
'scalar_b': 3
},
{
'date': '2022-04-22',
'kind': 'bravo',
'scalar_a': 5,
'scalar_b': 1
},
])
I want to:
My attempt was to create two dataframes (one for calculating the per-date totals, and another one to perform the pivot transformation), and then concatenate them across the horizontal axis.
totals_df = df.groupby('date').agg(
total_a=('scalar_a', 'sum'),
total_b=('scalar_b', 'sum'),
)
# I also need a column calculated by the aggregated fields.
totals_df["Total a*b"] = totals_df["total_a"] * totals_df["total_b"]
# Then I sort by descending date.
totals_df = totals_df.sort_values('date',ascending=False)
# And then to build my pivoted dataframe
pivot_df = df.pivot_table(
index=['date'],
columns=['kind'],
fill_value=0
)
# I attempt to invert the two top-level headers ([scalar_a, scalar_b] with [])
pivot_fundos = pivot_fundos.swaplevel(1,0, axis=1).sort_index(axis=1)
How would I proceed to include a per-date/per-kind aggregation? I want to include a column with scalar_a
+ scalar_b
for each kind sub-column.
Then I concatenate both dataframes.
concatenated_dataframes = pd.concat([pivot_df, totals_df], axis=1)
The resulting dataframe doesn't have the "two-levels" headers I expected when calling the to_table()
method. How do I fix that?
Upvotes: 1
Views: 314
Reputation: 260845
I would approach it this way. First pivot_table
, then use groupby
on columns to compute the sums, finally concat
everything (with inner concat to add the missing levels):
df2 = df.pivot_table(index='date', columns='kind',
values=['scalar_a', 'scalar_b'], fill_value=0)
df3 = (df2
.groupby(level=0, axis=1).sum().add_prefix('total_')
.assign(**{'Total a*b': lambda d: d.prod(1)})
)
out = pd.concat(
[df2.swaplevel(axis=1),
pd.concat({'sum': df2.groupby(level=1, axis=1).sum()}, axis=1)
.swaplevel(axis=1),
pd.concat({'Totals': df3}, axis=1),
], axis=1).sort_index(axis=1)
output:
kind Totals alpha bravo charlie
Total a*b total_scalar_a total_scalar_b scalar_a scalar_b sum scalar_a scalar_b sum scalar_a scalar_b sum
date
2022-04-20 60 5 12 2 5 7 3 7 10 0 0 0
2022-04-21 12 4 3 0 0 0 0 0 0 4 3 7
2022-04-22 5 5 1 0 0 0 5 1 6 0 0 0
df2 = df.pivot_table(index='date', columns='kind',
values=['scalar_a', 'scalar_b'], fill_value=0)
df3 = (df2
.groupby(level=0, axis=1).sum().add_prefix('total_')
.assign(**{'Total a*b': lambda d: d.prod(1)})
)
df3 = pd.concat({'Totals': df3}, axis=1)
idx = pd.MultiIndex.from_product([df['kind'].unique(),list(df.filter(like='scalar'))+['sum']])
out = pd.concat(
[df2.swaplevel(axis=1),
pd.concat({'sum': df2.groupby(level=1, axis=1).sum()}, axis=1)
.swaplevel(axis=1),
df3,
], axis=1)[list(idx)+list(df3)]
Output:
kind alpha bravo charlie Totals
scalar_a scalar_b sum scalar_a scalar_b sum scalar_a scalar_b sum total_scalar_a total_scalar_b Total a*b
date
2022-04-20 2 5 7 3 7 10 0 0 0 5 12 60
2022-04-21 0 0 0 0 0 0 4 3 7 4 3 12
2022-04-22 0 0 0 5 1 6 0 0 0 5 1 5
Upvotes: 1
Reputation: 4543
Use:
df2 = df.pivot(index='date', columns='kind', values=['scalar_a', 'scalar_b']).swaplevel(1,0, axis=1).sort_index(axis=1)
cols = [( x, 'sum') for x in df['kind'].unique()]
df2[cols]=df2.sum(axis=1, level=0).sort_index(axis = 1)
Output:
Upvotes: 0