Pandas - remove combination of string and number from column

Question

Below is a subset of a pandas dataframe with a column like the below

   No                   Name
0    1   SOU 01 Sungai Dingin
1    2                  PKS 2
2    3                 Mill 3
3    4    Tanah Kerajaan Mill
4    5                MAS POM
5    6           SOU 20 Chaah
6    7     SOU 03 Elphil Mill
7    8       SOU 08 East Mill
8    9  SOU 04 Flemington POM
9   10    SOU 30A Jeleta Bumi
10  11         SOU 30B Mostyn
11  12          KLK - Mill 02
12  13           Chini 02 POM
13  14      SOU 05 Selaba POM
14  15     SOU 9A Sepang Mill

I am trying to figure out the best way to use regex in python to easily remove just the 'SOU XX' or 'SOU XXX' combination of string and numbers in that column without affecting the other text in the column?

The output would be something like the below:

    No                 Name
0    1        Sungai Dingin
1    2                PKS 2
2    3               Mill 3
3    4  Tanah Kerajaan Mill
4    5              MAS POM
5    6                Chaah
6    7          Elphil Mill
7    8            East Mill
8    9       Flemington POM
9   10          Jeleta Bumi
10  11               Mostyn
11  12        KLK - Mill 02
12  13         Chini 02 POM
13  14           Selaba POM
14  15          Sepang Mill

Aran-Fey · Accepted Answer

You can use the regex ^SOU \S{2,3} (note the trailing space at the end) with str.replace:

df['Name'] = df['Name'].str.replace(r'^SOU \S{2,3} ', '')

Result:

    No                 Name
0    1        Sungai Dingin
1    2                PKS 2
2    3               Mill 3
3    4  Tanah Kerajaan Mill
4    5              MAS POM
5    6                Chaah
6    7          Elphil Mill
7    8            East Mill
8    9       Flemington POM
9   10          Jeleta Bumi
10  11               Mostyn
11  12        KLK - Mill 02
12  13         Chini 02 POM
13  14           Selaba POM
14  15          Sepang Mill

The regex ^SOU \S{2,3} matches the letters "SOU" plus any two or three non-space characters \S, but only if they appear at the start of the string thanks to the ^ anchor.

Pandas - remove combination of string and number from column

Answers (1)

Related Questions