Reputation: 351
I have a .csv as the following:
Population Region
1001 Rigolet (N.L.)
2000 Nain (N.L.)
3000 Lot 63 (P.E.I.)
4000 Lot 53 (P.E.I.)
5000 Burnt Islands (N.L.)
6000 Burgeo (N.L.)
7000 Ham-Nord (Que.)
8000 Chesterville (Que.)
1000 Warwick (Que.)
9000 Prince (Ont.)
1002 Wawa (Ont.)
I'd like to group by the ending part in the parentheses of the string in the Region column, such as '(N.L.)' or '(Ont.)'.
How could I do this?
Thanks a lot!
Upvotes: 0
Views: 29
Reputation: 42916
Use Series.str.rsplit
with n=1
so you split on the first whitespace from the right. Then groupby on these values:
grps = df['Region'].str.rsplit(n=1).str[-1]
df.groupby(grps).#dosomething
When we wrint grps:
print(grps)
0 (N.L.)
1 (N.L.)
2 (P.E.I.)
3 (P.E.I.)
4 (N.L.)
5 (N.L.)
6 (Que.)
7 (Que.)
8 (Que.)
9 (Ont.)
10 (Ont.)
Name: Region, dtype: object
Upvotes: 1