acolls_badger
acolls_badger

Reputation: 465

Pandas Group By producing a series; not a groupby object

I have a Pandas DataFrame of transactions:

transactions.head():

   Amount      Date of Transaction   Description  \
0   39.95      2017-03-30            Fake_Transaction_One   
1    2.39      2017-04-01            Fake_Transaction_Two      
2    8.03      2017-04-01            Fake_Transaction_Three      
3   34.31      2017-04-01            Fake_Transaction_Four    
4   10.56      2017-04-03            Fake_Transaction_Five     

       Purchase_Type        year_month  
0      Miscellaneous        2017-03  
1      tool_expense         2017-04  
2      food_and_domestic    2017-04  
3      food_and_domestic    2017-04  
4      food_and_domestic    2017-04  

I run a groupby command on this DataFrame:

grouped_transactions = transactions.groupby(['Purchase_Type','year_month'])['Amount'].sum()

Which produces a groupby object:

Purchase_Type        year_month
tool_expense         2017-04       72.49
Calendar_Event       2017-08        3.94
                     2017-12       23.92
                     2018-02       42.91
                     2018-03       10.91

I want to run groupby commands on this such as

grouped_transactions.groups.keys()

However I am unable to as the object is not a groupby object, but rather a Series:

In: type(grouped_transactions)
Out: pandas.core.series.Series

Looking at grouped_transactions is appears to be a groupby object, not a Series. Further it was created but running the .groupby method on a Pandas DataFrame. As such I am unsure why it is a Series.

What is the error in my understanding or my approach?

Upvotes: 3

Views: 7765

Answers (3)

Real1Minshen
Real1Minshen

Reputation: 1

use this:

grouped_transactions = transactions.groupby(['Purchase_Type','year_month'])[['Amount']].sum()

If you use double brackets, a list will be created, and it will be passed as an argument to the DataFrame indexing function.

Upvotes: 0

jpp
jpp

Reputation: 164713

Indexing a groupby object, or calling an aggregation method on it, converts it to series or dataframe type objects. Best practice: if you need keys as well as aggregation, assign your GroupBy object to a variable and then perform multiple operations on the object.

Below are some examples.

df = pd.DataFrame([['A', 'B', 1], ['A', 'B', 2], ['A', 'C', 3]])

g = df.groupby([0, 1])
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x0000000007E76AC8>

keys = g.groups.keys()
# dict_keys([('A', 'B'), ('A', 'C')])

sums_df = g.sum()
# <class 'pandas.core.frame.DataFrame'>

sums_series_group = g[2]
# <class 'pandas.core.groupby.groupby.SeriesGroupBy'>

sums_series = g[2].sum()
# <class 'pandas.core.series.Series'>

Upvotes: 2

jezrael
jezrael

Reputation: 862911

It is expected behaviour (if methods are chained like groupby with aggregate function) to get a Series or DataFrame.

If you need groupby object:

g = transactions.groupby(['Purchase_Type','year_month'])
print (g)
<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x00000000191EA5C0>

But if you need to convert a MultiIndex created by aggregation to columns:

df = transactions.groupby(['Purchase_Type','year_month'], as_index=False)['Amount'].sum()

Or:

df = transactions.groupby(['Purchase_Type','year_month'])['Amount'].sum().reset_index()

print (df)
       Purchase_Type year_month  Amount
0      Miscellaneous    2017-03   39.95
1  food_and_domestic    2017-04   52.90
2       tool_expense    2017-04    2.39

Upvotes: 5

Related Questions