fazistinho_
fazistinho_

Reputation: 335

Python Groupby part of a string

I'm grouping a list of transactions by UK Postcode, but I only want to group by the first part of the postcode. So, UK post codes are in two parts, outward and inward, separated by a [space]. e.g. W1 5DA.

subtotals = df.groupby('Postcode').count()

Is the way I'm doing it now, the way I've thought about doing it at the moment is adding another column to the DataFrame with just the first word of the Postcode column, and then grouping by that... but I'm wondering if there's any easier way to do it.

Thank you

Upvotes: 2

Views: 2370

Answers (1)

jezrael
jezrael

Reputation: 863511

I think you need groupby by Series created by split by first space:

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()

Sample:

df = pd.DataFrame({'Postcode' :['W1 5DA','W1 5DA','W2 5DA']})
print (df)
  Postcode
0   W1 5DA
1   W1 5DA
2   W2 5DA

print (df['Postcode'].str.split().str[0])
0    W1
1    W1
2    W2
Name: Postcode, dtype: object

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()
print (subtotals)
          Postcode
Postcode          
W1               2
W2               1

Check also What is the difference between size and count in pandas?

Upvotes: 4

Related Questions