For each row in Pandas dataframe, check if row contains string from list

Question

I have a given list of strings, like that:

List=['plastic', 'carboard', 'wood']

I have a column of dtype string in my dataframe, like that:

Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']

For each row in the column, I want to know if the row contains a word from the list, and if yes, I want to keep only the text that comes before that word, like that:

New_Column=['beer', 'water', 'eggs', 'fruits']

Is there a way to systematize this for each row of my dataframe (millions of rows)? Thanks

PS. I have tried building a function with regular expression pattern matching like this

pattern=re.compile('**Pattern to be defined to include element from list**')

def truncate(row, pattern):
    Column=row['Column']
    if bool(pattern.match(Column)):
        Column=Column.replace(**word from list**,"")
        return Column

df['New_column']=df.apply(truncate,axis=1, pattern=pattern)

Lina Alice Anderson · Accepted Answer

import pandas as pd
...
for index, row in df.iterrows():
    for word in List_name:
        row['Column_name'] = row['Column_name'].partition(word)[0] if (word in row['Column_name']) else row['Column_name']

If you want to run a working example:

import pandas as pd

List=['plastic', 'carboard', 'wood']
df = pd.DataFrame([{'c1':"fun carboard", 'c2':"jolly plastic"}, {'c1':"meh wood",'c2':"aba"}, {'c1':"aaa",'c2':"bbb"}, {'c1':"old wood",'c2':"bbb"}])

for index, row in df.iterrows():
    for word in List:
        row['c1'] = row['c1'].partition(word)[0] if (word in row['c1']) else row['c1']
        row['c2'] = row['c2'].partition(word)[0] if (word in row['c2']) else row['c2']
df

For each row in Pandas dataframe, check if row contains string from list

Answers (2)

Related Questions