Reputation: 131
I'm trying to rename over a bunch of columns in my df in Python. Because there are over a 1000 that should be renamed I'm trying to do it with regex since I saw that Python allows you to do this. More specifically, every column ending in _Sum should be renamed, with the _Sum part, replaced by '_max' (ex.: column1_Sum -> column1_max). I've tried following code:
df = df.rename(columns=lambda x: re.sub('(.+)_Sum$','$1_max',x))
But it just replaces every columnname literally with '$1_max'. I've worked previously with regex in other programs and I always thought that $1 captures your previous group, in this case, everything before the '_', so I don't really know what I'm doing wrong here.
Upvotes: 0
Views: 104
Reputation: 6776
You don't need the capturing groups for your specific problem. You can simply do:
df.columns = df.columns.str.replace('_Sum$', '_max')
In case you do eventually need capturing groups, you can use something like:
df.columns.str.replace('(.+)_Sum$', lambda x: f'{x.group(1)}_max')
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html
Upvotes: 4