Mitchell Stearns
Mitchell Stearns

Reputation: 17

How to use pandas column values as lookup within other dataframe

I have two pandas dataframes, one that contains one column of all open text movie reviews (movie_review_df) and the other (movie_ngrams_df) that contains the most common ngrams (top 5 of ngram = 1 and top 5 of ngram = 2) found within movie_review_df.

I would essentially like to write a function that would iterate over every row of my words/word phrase column within my movie_ngrams_df and use them as lookups to find the reviews that contain those words/work phrases.

Imagine my movie_ngrams_df has 2 values across 2 columns.

1) The word 'love' in column a (ngram_wordphrase) and 'one' in column b (ngram_group)

2) The phrase 'too long' in column a and 'two' in column b

I think that a function that uses a loop and a .contains() call will work but can't seem to wrap my head around it.

This is somewhat how I would want it to work.

def ngram_lookup (ngram,reviews):
appended_df = pd.concat(for word in ngram:                            
reviews_df[reviews_df['reviews'].str.contains('ngram')])
return appended_df

I want a function that will search every movie review text in the movie_review_df and pull out the reviews that contain the word 'love'. I want the output to be a new df (ngram_detail_df) where each row represented contains the word_phrase (eg. love in column a) and then the full individual string review (placed in column b) that contains the word 'love' in it. So each word_phrase will likely be listed multiple times in column a.

THEN (you knew it was coming) I want to be able to do the same thing for the next word_phrase in our movie_ngrams_df which was 'too long'. I want to append these new 'too long' results to the results returned from our 'love' search so that at the end, we have just one df containing the top word_phrases and each movie review where that word/word_phrase is present.

Upvotes: 0

Views: 176

Answers (1)

AndrewH
AndrewH

Reputation: 234

What about something like

words = movie_ngrams_df["ngram_wordphrase"].array
ngram_detail_df = movie_review_df.copy()

for word in words:
    ngram_detail_df[word] = ngram_detail_df["reviews"].apply(lambda x: word in x)

ngram_detail_df = ngram_detail_df.melt(id_vars=["reviews"])
ngram_detail_df = ngram_detail_df[ngram_detail_df["value"] == True]
ngram_detail_df = ngram_detail_df.loc[:, ["reviews", "variable"]
ngram_detail_df.rename(columns={"variable": "ngram"}, inplace=True)

Upvotes: 1

Related Questions