xpt
xpt

Reputation: 23036

pandas.DataFrame.replace with wildcards

Does the pandas.DataFrame.replace regex replace support wildcards and "capture groups"?

E.g., to replace ([A-Z])(\w+) with \2\1?

What kind of regular expression is supported? Does Perl's regex supported? E.g., OK to replace ([A-Z])(\w+) with \l\1\2 (\l: Change the next character to lowercase.)

UPDATE:

As Steve has pointed out, according to the Python documentation, it should work, but the following is not giving me what I expected:

df = pd.DataFrame({'A': np.random.choice(['foo', 'bar'], 100),
                   'B': np.random.choice(['one', 'two', 'three'], 100),
                   'C': np.random.choice(['I1', 'I2', 'I3', 'I4'], 100),
                   'D': np.random.randint(-10,11,100),
                   'E': np.random.randn(100)})
df.replace("f(.)(.)","b\1\2", regex=True,inplace=True)

What's wrong?

Thx

Upvotes: 3

Views: 2059

Answers (1)

Steven Doggart
Steven Doggart

Reputation: 43743

According to the pandas documentation:

Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

So, yes, any substitutions which can be performed with Python's re.sub (such as \1) can also be performed with pandas.DataFrame.replace. See the Python documentation for more information.

Upvotes: 3

Related Questions