JuKo
JuKo

Reputation: 21

Python: Splitting a string at any uppercase letter (as part of a rename for a column name)

I would like to rename columns in a Pandas dataframe using rename function and therefore I would like to split the name (string) at an uppercase letter within the string. So for example my column names are something like 'FooBar' or 'SpamEggs' and one column is called 'Monty-Python'. My goal are column names like 'foo_bar' 'spam_eggs' and 'monty_python'.

I know that

'-'.join(re.findall('[A-Z][a-z]*', 'FooBar'))

will give me Foo-Bar

But this cannot be included in my rename function:

df.rename(columns=lambda x: x.strip().lower().replace("-", "_"), inplace=True)

(should go between strip and lower but gives back a Syntax Error).

Can anyone help me to include the snippet to rename or help me find another solution than findall?

Upvotes: 1

Views: 840

Answers (1)

cs95
cs95

Reputation: 402824

  1. Remove anything that is not a letter
  2. Prepend an underscore (_) to uppercase letters that are not at the start of the string
  3. Lowercase the result
df.columns
Index(['FooBar', 'SpamEggs', 'Monty-Python'], dtype='object')

df.columns.str.replace('[\W]', '')\
          .str.replace('(?<!^)([A-Z])', r'_\1')\
          .str.lower()
Index(['foo_bar', 'spam_eggs', 'monty_python'], dtype='object')

This solution generalises quite nicely. Assign the result back to df.columns.

Upvotes: 2

Related Questions