Rename columns with regex in Python

Question

I'm trying to rename over a bunch of columns in my df in Python. Because there are over a 1000 that should be renamed I'm trying to do it with regex since I saw that Python allows you to do this. More specifically, every column ending in _Sum should be renamed, with the _Sum part, replaced by '_max' (ex.: column1_Sum -> column1_max). I've tried following code:

df = df.rename(columns=lambda x: re.sub('(.+)_Sum$','$1_max',x))

But it just replaces every columnname literally with '$1_max'. I've worked previously with regex in other programs and I always thought that $1 captures your previous group, in this case, everything before the '_', so I don't really know what I'm doing wrong here.

Shovalt · Accepted Answer

You don't need the capturing groups for your specific problem. You can simply do:

df.columns = df.columns.str.replace('_Sum$', '_max')

In case you do eventually need capturing groups, you can use something like:

df.columns.str.replace('(.+)_Sum$', lambda x: f'{x.group(1)}_max')

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html

Rename columns with regex in Python

Answers (1)

Related Questions