Reputation: 922
I have a list of words, greenwords.
print(greenwords)
['eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions']
I want a list of columns where the column title contains one of the greenwords. For example if I wanted a list of all of the columns that contained 'eco' I would do this:
eco_columns = [col for col in df.columns if 'eco' in col]
Doing this does not work:
greenwords_columns = [col for col in df.columns if greenwords in col]
Suggestions for capturing all columns with a greenword in the column title?
Upvotes: 1
Views: 832
Reputation: 109528
You are close, but you need to check each greenword in each column. Use sets for faster membership testing.
df = pd.DataFrame(columns=['Organic Icecream', 'Green Energy', 'Taxes'])
# Assumes all `greenwords` are lower case.
greenwords = {'eco', 'eco-friendly', 'green', 'organic', 'clean', 'sustainable', 'sustainability', 'carbon', 'emissions'}
greenwords_columns = [col for col in df
if any(greenword in col.lower() for greenword in greenwords)]
>>> greenwords_columns
['Organic Icecream', 'Green Energy']
Upvotes: 1
Reputation: 16683
You can convert your list to a string separating each value by |
with '|'.join()
. From there you can utilize str.contains
to return columns that contain any of the greenwords
:
df.columns[df.columns.str.contains('|'.join(greenwords))]
You can also pass case=False
to match case-insensitively:
df.columns[df.columns.str.contains('|'.join(greenwords), case=False)]
Upvotes: 1