Reputation: 300
Using the following code df_grouped was created.
df_grouped = df.groupby(by='Pclass')
Below a loop prints the Pclass value as well as the length of each grouped amount.
for val,grp in df_grouped:
print('There were',len(grp),'people traveling in',val,'class.')
How does the code access the information? How can val & grp be used without being referenced earlier? How is this information stored inside the groupby object?
Upvotes: 3
Views: 5951
Reputation: 9019
Referencing the docs: "The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group"
You may be interested in looking into .agg()
, for example:
df = pd.DataFrame([['Person A', 2, 3, 4],
['Person B', 3, 2, 1],
['Person C', 5, 7, 5],
['Person A', 3, 4, 9],
['Person C', 8, 3, 2]],
columns=['Person','Val 1','Val 2','Val 3'])
Gives the following dataframe:
Person Val 1 Val 2 Val 3
0 Person A 2 3 4
1 Person B 3 2 1
2 Person C 5 7 5
3 Person A 3 4 9
4 Person C 8 3 2
Then doing a groupyby
and agg
:
df.groupby('Person').agg({'Val 1': 'sum', 'Val 2': 'mean', 'Val 3': 'count'})
Gives:
Val 1 Val 2 Val 3
Person
Person A 5 3.5 2
Person B 3 2.0 1
Person C 13 5.0 2
Here you can simply pass a dictionary to agg
that specifies operations that you would like to perform on each group for a specific column.
Upvotes: 1
Reputation: 51335
As noted in the Group By: split-apply-combine documentation, the data are stored in a GroupBy object
, which is a data structure with special attributes.
You can verify this for yourself:
>>> type(df_grouped)
Should return:
<class 'pandas.core.groupby.DataFrameGroupBy'>
The structure of the data is well explained by this snippet from the docs:
The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.
As you noticed, you can easily iterate through each individual group. However, there are often vectorized methods that work very nicely with groupby
objects, and can access information and calculate things much more effectively and quickly.
Upvotes: 3