Reputation: 3379
I have a dataframe where I want to group by the first part of an ID field. For example, say I have the following:
>>> import pandas as pd
>>> df=pd.DataFrame(data=[['AA',1],['AB',4],['AC',5],['BA',11],['BB',2],['CA',9]], columns=['ID','Value'])
>>> df
ID Value
0 AA 1
1 AB 4
2 AC 5
3 BA 11
4 BB 2
5 CA 9
>>>
How can I group by the first letter of the ID field?
I can currently do this by creating a new column and then grouping on that, but I imagine there is a more efficient way:
>>> df['GID']=df['ID'].str[:1]
>>> df.groupby('GID')['Value'].sum()
GID
A 10
B 13
C 9
Name: Value, dtype: int64
>>>
Upvotes: 6
Views: 5801
Reputation: 142166
You will need to create a grouping key somehow, just not necessarily on the DataFrame itself, for eg:
df.groupby(df.ID.str[:1])['Value'].sum()
Upvotes: 10