Reputation: 1821
I have a dataframe with 20 columns, and 3 of those columns (always the same) may contain one or more of these strings ["fraction", "fractional", "1/x", "one fifth"].
I want to add a new column that says whether or not each row is "fractional" (in other words, contains one of those words). This column could have Y or N to indicate this.
I've tried to do it with iterrows
, like so:
list_of_fractional_widgets = []
for index, row in df.iterrows():
fractional_keywords = ["fraction", "fractional", "1/x", "one fifth", "Fraction"]
# use str to remove offending nan values
xx = str(row["HeaderX"])
yy = str(row["HeaderY"])
zz = str(row["HeaderZ"])
widget_data = [xx, yy, zz]
for word in fractional_keywords:
found = [True for x in widget_data if word in x]
if len(found)>0:
list_of_fractional_widgets.append('Y')
break
if len(found) ==0:
list_of_fractional_widgets.append('N')
df['Fractional?'] = list_of_fractional_widgets
however, I'm trying to understand if there is a more pandas
/ numpy
efficient way to do so. Something like:
np.where(df['HeaderX'].str.contains(fractional_keywords?)), True)
as described in this SO question, but using a list and different headers.
Upvotes: 0
Views: 459
Reputation: 59579
Create a single pattern by joining all the words with '|'
. Then we check the condition in each column separately using Series.str.contains
and create a single mask using np.logical_or.reduce
.
import pandas as pd
import numpy as np
keywords = ["fraction", "fractional", "1/x", "one fifth", "Fraction"]
np.random.seed(45)
df = pd.DataFrame(np.random.choice(keywords+list('abcdefghijklm'), (4,3)),
columns=['HeaderX', 'HeaderY', 'HeaderZ'])
pat = '|'.join(keywords)
df['Fractional?'] = np.logical_or.reduce([df[col].str.contains(pat)
for col in ['HeaderX', 'HeaderY', 'HeaderZ']])
HeaderX HeaderY HeaderZ Fractional?
0 g one fifth fraction True
1 one fifth Fraction k True
2 fractional j d True
3 j d h False
As a bonus, Series.str.contains
can accept a case=False
argument to ignore case when matching so there is no need to separately specify both 'fraction'
and 'Fraction'
(or any arbitrary capitalization like 'FracTIOn'
).
Upvotes: 2