Loop over each word in each row and remove words if in a list

Question

I have the below column in a dataframe (each row is a person and there are a list of tokenised words in each cell).

Q395_R

[due, car, accident, year, ago, medical, condi...
[spending, time, loved, one, commute, able, co...
[initially, understanding, need, lockdown, ero...
[time, focus, exercise, le, sport, do, poured,..
[spending, time, family, realisation, need, ru...

I also have a list of words:

words395 = ['rising',
 'accident',
 'le',
 'lasted',
 'understanding',
 'spending',
 'adopted',
 'raising',
 'fabulous',
 'loneliness',
 'contract',....]

I would like to create a function that

loops over each person in each row
loop over each word in each row
deletes words in each cell if the word is in the list words395

I am not sure how to create two loops together to go through each person and word, can someone help with this?

Expected outcome:

Q395_R
    
[due, car, year, ago, medical, condi...
[time, loved, one, commute, able, co...
[initially, need, lockdown, ero...
[time, focus, exercise, sport, do, poured,..
[time, family, realisation, need, ru...

jezrael · Accepted Answer

Use lambda function with convert values to list to sets:

s = set(words395)
df['Q395_R'] = df['Q395_R'].apply(lambda x: [y for y in x if y not in s])

Loop over each word in each row and remove words if in a list

Answers (1)

Related Questions