Joelle Kappert
Joelle Kappert

Reputation: 3

Dropping different values from multiple columns

I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?

Here is an example of the columns:

Miscellaneous group | 00002928  Alcoholic Beverages | 0000292   Animal fats group | 000029

I tried .rename() already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.

Upvotes: 0

Views: 59

Answers (2)

ouroboros1
ouroboros1

Reputation: 14369

Using df.columns.str.split:

columns = ["Miscellaneous group | 00002928",  
           "Alcoholic Beverages | 0000292",
           "Animal fats group | 000029"]

df = pd.DataFrame(columns=columns)

df.columns = df.columns.str.split(r'\s+\|', regex=True).str[0]

Or df.columns.str.replace:

df.columns = df.columns.str.replace(r'\s+\|.*$', '', regex=True)

Also possible via map and re.sub:

import re

df.columns = map(lambda x: re.sub(r'\s+\|.*$', '', x), df.columns)

With df.rename you could apply logic like:

df = df.rename(columns=lambda x: x.split(' |')[0])

Or indeed via re.split:

df = df.rename(columns=lambda x: re.split(r'\s+\|', x)[0])

For the regex pattern, see regex101.

Upvotes: 1

David Buck
David Buck

Reputation: 3843

Assuming you're starting off with, e.g.

df.columns = ["Miscellaneous group | 00002928",  "Alcoholic Beverages | 0000292",   "Animal fats group | 000029"]

The simplest solution looks like it would be to use a list comprehension to iterate over the column names and split on the | in your string and keep the first part of the resulting list, so:

df.columns = [col.split(" | ")[0] for col in columns]

This returns:

['Miscellaneous group', 'Alcoholic Beverages', 'Animal fats group']

Alternatively, you could do this with a regex:

import re

df.columns = [re.sub(r'\s*\|.*', '', col) for col in columns]

This looks for a string that begins with whitespace, followed by |, followed by anything and replaces it all with an empty string.

Final alternative:

columns = [re.sub(r'\s*\d+$', '', s) for s in columns]

This looks for whitespace followed by digits at the end of each string, so this would remove the trailing digits regardless of what preceded them (in case the | isn't always present), so it would produce:

['Miscellaneous group |', 'Alcoholic Beverages |', 'Animal fats group |']

Upvotes: 0

Related Questions