mk2080
mk2080

Reputation: 922

How to select columns based on a list of words in the column names

I have a list of words, greenwords.

print(greenwords)
['eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions']

I want a list of columns where the column title contains one of the greenwords. For example if I wanted a list of all of the columns that contained 'eco' I would do this:

eco_columns = [col for col in df.columns if 'eco' in col]

Doing this does not work:

greenwords_columns = [col for col in df.columns if greenwords in col]

Suggestions for capturing all columns with a greenword in the column title?

Upvotes: 1

Views: 832

Answers (2)

Alexander
Alexander

Reputation: 109528

You are close, but you need to check each greenword in each column. Use sets for faster membership testing.

df = pd.DataFrame(columns=['Organic Icecream', 'Green Energy', 'Taxes'])

# Assumes all `greenwords` are lower case.
greenwords = {'eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions'}
greenwords_columns = [col for col in df 
                      if any(greenword in col.lower() for greenword in greenwords)]

>>> greenwords_columns
['Organic Icecream', 'Green Energy']

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

You can convert your list to a string separating each value by | with '|'.join(). From there you can utilize str.contains to return columns that contain any of the greenwords:

df.columns[df.columns.str.contains('|'.join(greenwords))]

You can also pass case=False to match case-insensitively:

df.columns[df.columns.str.contains('|'.join(greenwords), case=False)]

Upvotes: 1

Related Questions