Python Groupby part of a string

Question

I'm grouping a list of transactions by UK Postcode, but I only want to group by the first part of the postcode. So, UK post codes are in two parts, outward and inward, separated by a [space]. e.g. W1 5DA.

subtotals = df.groupby('Postcode').count()

Is the way I'm doing it now, the way I've thought about doing it at the moment is adding another column to the DataFrame with just the first word of the Postcode column, and then grouping by that... but I'm wondering if there's any easier way to do it.

Thank you

jezrael · Accepted Answer

I think you need groupby by Series created by split by first space:

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()

Sample:

df = pd.DataFrame({'Postcode' :['W1 5DA','W1 5DA','W2 5DA']})
print (df)
  Postcode
0   W1 5DA
1   W1 5DA
2   W2 5DA

print (df['Postcode'].str.split().str[0])
0    W1
1    W1
2    W2
Name: Postcode, dtype: object

subtotals = df.groupby(df['Postcode'].str.split().str[0]).count()
print (subtotals)
          Postcode
Postcode          
W1               2
W2               1

Check also What is the difference between size and count in pandas?

Python Groupby part of a string

Answers (1)

Related Questions