pandas.DataFrame.replace with wildcards

Question

Does the pandas.DataFrame.replace regex replace support wildcards and "capture groups"?

E.g., to replace ([A-Z])(\w+) with \2\1?

What kind of regular expression is supported? Does Perl's regex supported? E.g., OK to replace ([A-Z])(\w+) with \l\1\2 (\l: Change the next character to lowercase.)

UPDATE:

As Steve has pointed out, according to the Python documentation, it should work, but the following is not giving me what I expected:

df = pd.DataFrame({'A': np.random.choice(['foo', 'bar'], 100),
                   'B': np.random.choice(['one', 'two', 'three'], 100),
                   'C': np.random.choice(['I1', 'I2', 'I3', 'I4'], 100),
                   'D': np.random.randint(-10,11,100),
                   'E': np.random.randn(100)})
df.replace("f(.)(.)","b\1\2", regex=True,inplace=True)

What's wrong?

Thx

Steven Doggart · Accepted Answer

According to the pandas documentation:

Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

So, yes, any substitutions which can be performed with Python's re.sub (such as \1) can also be performed with pandas.DataFrame.replace. See the Python documentation for more information.

pandas.DataFrame.replace with wildcards

Answers (1)

Related Questions