Vince Miller
Vince Miller

Reputation: 300

How do I access data inside a pandas dataframe groupby object?

Using the following code df_grouped was created.

df_grouped = df.groupby(by='Pclass')

Below a loop prints the Pclass value as well as the length of each grouped amount.

for val,grp in df_grouped:
    print('There were',len(grp),'people traveling in',val,'class.')

How does the code access the information? How can val & grp be used without being referenced earlier? How is this information stored inside the groupby object?

Upvotes: 3

Views: 5951

Answers (2)

rahlf23
rahlf23

Reputation: 9019

Referencing the docs: "The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group"

You may be interested in looking into .agg(), for example:

df = pd.DataFrame([['Person A', 2, 3, 4],
                ['Person B', 3, 2, 1],
                ['Person C', 5, 7, 5],
                ['Person A', 3, 4, 9],
                ['Person C', 8, 3, 2]],
                columns=['Person','Val 1','Val 2','Val 3'])

Gives the following dataframe:

     Person  Val 1  Val 2  Val 3
0  Person A      2      3      4
1  Person B      3      2      1
2  Person C      5      7      5
3  Person A      3      4      9
4  Person C      8      3      2

Then doing a groupyby and agg:

df.groupby('Person').agg({'Val 1': 'sum', 'Val 2': 'mean', 'Val 3': 'count'})

Gives:

          Val 1  Val 2  Val 3
Person                       
Person A      5    3.5      2
Person B      3    2.0      1
Person C     13    5.0      2

Here you can simply pass a dictionary to agg that specifies operations that you would like to perform on each group for a specific column.

Upvotes: 1

sacuL
sacuL

Reputation: 51335

As noted in the Group By: split-apply-combine documentation, the data are stored in a GroupBy object, which is a data structure with special attributes.

You can verify this for yourself:

>>> type(df_grouped)

Should return:

<class 'pandas.core.groupby.DataFrameGroupBy'>

The structure of the data is well explained by this snippet from the docs:

The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

As you noticed, you can easily iterate through each individual group. However, there are often vectorized methods that work very nicely with groupby objects, and can access information and calculate things much more effectively and quickly.

Upvotes: 3

Related Questions