NLP preprocessing remove all words in string not found in my list

Question

I've made a list of important_words and a have a dataframe that has a column df['reviews'], that has one string of review text per row (thousands of rows). I want to update the 'reviews' by removing everything that is not in the important_words list from the string, like the opposite of having stop words, so that I am only left with the important_words per every review (row) in the df.

Also, later in my starter code I tokenize and normalize the column of df[reviews], it seems like applying to this column should make everything easier, since punctuation removal and lowercasing has also been applied. I'll try which ever method someone can share, thanks.

important_words = [actor, action, awesome]

   df['reviews'][1] = 'The actor, in the action movie was awesome'
   df['reviews'][2] = 'The action movie was not good'
   ....
   df['tokenized_normalized_reviews'][1] = [the,actor,in,the,action,movie,was,awesome]
   df['tokenized_normalized_reviews'][2] = [the, action, movie, was, not, good]

I want: 
df['review_important_words'][1] = 'actor, action, awesome' 
df['review_important_words'][2] = 'action' 
< either str or applied to the tokenized column>

NLP preprocessing remove all words in string not found in my list

Answers (1)

Related Questions