How to select columns based on a list of words in the column names

Question

I have a list of words, greenwords.

print(greenwords)
['eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions']

I want a list of columns where the column title contains one of the greenwords. For example if I wanted a list of all of the columns that contained 'eco' I would do this:

eco_columns = [col for col in df.columns if 'eco' in col]

Doing this does not work:

greenwords_columns = [col for col in df.columns if greenwords in col]

Suggestions for capturing all columns with a greenword in the column title?

Alexander · Accepted Answer

You are close, but you need to check each greenword in each column. Use sets for faster membership testing.

df = pd.DataFrame(columns=['Organic Icecream', 'Green Energy', 'Taxes'])

# Assumes all `greenwords` are lower case.
greenwords = {'eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions'}
greenwords_columns = [col for col in df 
                      if any(greenword in col.lower() for greenword in greenwords)]

>>> greenwords_columns
['Organic Icecream', 'Green Energy']

How to select columns based on a list of words in the column names

Answers (2)

Related Questions