User12345
User12345

Reputation: 5480

Replace column values using regex in pandas data frame

I have a column in pandas data frame like below. Column name is ABC

ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube

Now I want to replace the values in this column using regex like below

ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube

How can we do this type of string matching in pandas data frame.

Upvotes: 7

Views: 11877

Answers (3)

cs95
cs95

Reputation: 402263

Alt with str.extract:

df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Extension courtesy piRSquared:

df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Upvotes: 5

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Upvotes: 8

piRSquared
piRSquared

Reputation: 294218

Use positive lookbehind for lube or fuel while ignoring case.

import re
import pandas as pd

pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)

df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])

        ABC
0      Fuel
1      FUEL
2      Fuel
3      Fuel
4      Lube
5      Lube
6      Lube
7  cat_Lube

Upvotes: 5

Related Questions