Reputation: 49
I have a dataset that contains the names of some hotels and a review for each hotel, and I want to apply sentiment analysis on only the top ten repeated hotels in the dataset knowing that the dataset contains around 500 hotels so, how to select the reviews only for the top 10 hotels ?? I tried:
DF[DF['hotels']==DF['hotels'].value_counts()[:10]]['review']
but it didn't work out, it gave me an error:
Can only compare identically-labeled Series objects
Any clues??
Upvotes: 1
Views: 27
Reputation: 260590
Rather use isin
on the index of your value_counts
output, and loc
instead of chained slicing to avoid a SettingWithCopyWarning
if you later use this sliced Series.
out = DF.loc[DF['hotels'].isin(DF['hotels'].value_counts().index[:10]), 'review']
Upvotes: 2