Reputation: 463
I have a dataframe with user comments on a movie and would like to parse examples of when a user describes a movie as "movie1" meets "movie2"
User id Old id_New id Score Comments
947952018 3101_771355141 3.0 If you want to see a comedy and have a stupid ...
805407067 11903_18330 5.0 Argento?s fever dream masterpiece. Fairy tale ...
901306244 16077_771225176 4.5 Evil Dead II meets Brothers Grimm and Hawkeye ...
901306244 NaN_381422014 1.0 Biggest disappointment! There's a host of ...
15169683 NaN_22471 3.0 You know in the original story of Pinocchio he...
I've written a function that takes in a comment, finds the word "meets" and takes the first n words before and after meets and returns (hopefully) the essence of the titles of movie1 & movie2, which I plan to fuzzy match later to titles in another dataframe.
def parse_movie(comment, num_words):
words = comment.partition('meets')
words_before = words[0].split(maxsplit=num_words)[-num_words:]
words_after = words[2].split(maxsplit=num_words)[:num_words]
movie1 = ' '.join(words_before)
movie2 = ' '.join(words_after)
return movie1, movie2
How can I apply this function on the comments column of the original pandas dataframe and put the returned movie1 and movie2 titles in separate columns? I tried
df['Comments'].apply(parse_titles)
but then I cannot specify num_words I'd like to use. Operating directly on the column also doesn't work for me, and I'm not sure how to put the new movies into new columns.
parse_movie(sample['Comments'], 4)
AttributeError: 'Series' object has no attribute 'partition'
Suggestions would be appreciated!
Upvotes: 0
Views: 366
Reputation: 174
Based on how to split column of tuples in pandas dataframe? answer. This can be done using lambda function and apply(pd.Series). Save the results into dataframe column 'movie1' and 'movie2'.
num_words = 4
df[['movie1','movie2']] = df['comments'].apply(lambda comment: parse_movie(comment, num_words)).apply(pd.Series)
Upvotes: 1