leahyota
leahyota

Reputation: 29

Loop over each word in each row and remove words if in a list

I have the below column in a dataframe (each row is a person and there are a list of tokenised words in each cell).

Q395_R

[due, car, accident, year, ago, medical, condi...
[spending, time, loved, one, commute, able, co...
[initially, understanding, need, lockdown, ero...
[time, focus, exercise, le, sport, do, poured,..
[spending, time, family, realisation, need, ru...

I also have a list of words:

words395 = ['rising',
 'accident',
 'le',
 'lasted',
 'understanding',
 'spending',
 'adopted',
 'raising',
 'fabulous',
 'loneliness',
 'contract',....]

I would like to create a function that

  1. loops over each person in each row
  2. loop over each word in each row
  3. deletes words in each cell if the word is in the list words395

I am not sure how to create two loops together to go through each person and word, can someone help with this?

Expected outcome:

Q395_R
    
[due, car, year, ago, medical, condi...
[time, loved, one, commute, able, co...
[initially, need, lockdown, ero...
[time, focus, exercise, sport, do, poured,..
[time, family, realisation, need, ru...

Upvotes: 0

Views: 69

Answers (1)

jezrael
jezrael

Reputation: 863451

Use lambda function with convert values to list to sets:

s = set(words395)
df['Q395_R'] = df['Q395_R'].apply(lambda x: [y for y in x if y not in s])

Upvotes: 3

Related Questions