Reputation: 465
I have a Pandas DataFrame of transactions:
transactions.head():
Amount Date of Transaction Description \
0 39.95 2017-03-30 Fake_Transaction_One
1 2.39 2017-04-01 Fake_Transaction_Two
2 8.03 2017-04-01 Fake_Transaction_Three
3 34.31 2017-04-01 Fake_Transaction_Four
4 10.56 2017-04-03 Fake_Transaction_Five
Purchase_Type year_month
0 Miscellaneous 2017-03
1 tool_expense 2017-04
2 food_and_domestic 2017-04
3 food_and_domestic 2017-04
4 food_and_domestic 2017-04
I run a groupby command on this DataFrame:
grouped_transactions = transactions.groupby(['Purchase_Type','year_month'])['Amount'].sum()
Which produces a groupby object:
Purchase_Type year_month
tool_expense 2017-04 72.49
Calendar_Event 2017-08 3.94
2017-12 23.92
2018-02 42.91
2018-03 10.91
I want to run groupby commands on this such as
grouped_transactions.groups.keys()
However I am unable to as the object is not a groupby object, but rather a Series:
In: type(grouped_transactions)
Out: pandas.core.series.Series
Looking at grouped_transactions is appears to be a groupby object, not a Series. Further it was created but running the .groupby method on a Pandas DataFrame. As such I am unsure why it is a Series.
What is the error in my understanding or my approach?
Upvotes: 3
Views: 7765
Reputation: 1
use this:
grouped_transactions = transactions.groupby(['Purchase_Type','year_month'])[['Amount']].sum()
If you use double brackets, a list will be created, and it will be passed as an argument to the DataFrame indexing function.
Upvotes: 0
Reputation: 164713
Indexing a groupby
object, or calling an aggregation method on it, converts it to series
or dataframe
type objects. Best practice: if you need keys as well as aggregation, assign your GroupBy
object to a variable and then perform multiple operations on the object.
Below are some examples.
df = pd.DataFrame([['A', 'B', 1], ['A', 'B', 2], ['A', 'C', 3]])
g = df.groupby([0, 1])
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x0000000007E76AC8>
keys = g.groups.keys()
# dict_keys([('A', 'B'), ('A', 'C')])
sums_df = g.sum()
# <class 'pandas.core.frame.DataFrame'>
sums_series_group = g[2]
# <class 'pandas.core.groupby.groupby.SeriesGroupBy'>
sums_series = g[2].sum()
# <class 'pandas.core.series.Series'>
Upvotes: 2
Reputation: 862911
It is expected behaviour (if methods are chained like groupby
with aggregate function) to get a Series
or DataFrame
.
If you need groupby
object:
g = transactions.groupby(['Purchase_Type','year_month'])
print (g)
<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x00000000191EA5C0>
But if you need to convert a MultiIndex
created by aggregation to columns:
df = transactions.groupby(['Purchase_Type','year_month'], as_index=False)['Amount'].sum()
Or:
df = transactions.groupby(['Purchase_Type','year_month'])['Amount'].sum().reset_index()
print (df)
Purchase_Type year_month Amount
0 Miscellaneous 2017-03 39.95
1 food_and_domestic 2017-04 52.90
2 tool_expense 2017-04 2.39
Upvotes: 5