Reputation: 5480
I have a column in pandas
data frame like below. Column name is ABC
ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube
Now I want to replace the values in this column using regex like below
ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube
How can we do this type of string matching in pandas
data frame.
Upvotes: 7
Views: 11877
Reputation: 402263
Alt with str.extract
:
df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Extension courtesy piRSquared:
df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Upvotes: 5
Reputation: 210822
In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Upvotes: 8
Reputation: 294218
Use positive lookbehind for lube
or fuel
while ignoring case.
import re
import pandas as pd
pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)
df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])
ABC
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Upvotes: 5