Reputation: 106
I want to scrape all the reviews for a particular movie from the IMDB website. I have used the 'Html-parser' of BeautifulSoup package for the same.
Link
Consider this link, I want to scrape all the movie reviews (i.e. Total = 69) for this movie but since 25 reviews are visible on-page, Soup will extract only 25 reviews instead of Total reviews here.
My Code:
url = "https://www.imdb.com/title/tt6654210/reviews?ref_=tt_ov_rt"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
review_list = getReviewsList(soup)
len(review_list)
Output:
25
I am quite new to web scraping, would be grateful if anyone can help me with the same.
Upvotes: 0
Views: 1289
Reputation: 4723
If you want to scrape a page, first you must realize how it is works, inspecting with dev tools and analyze the network calls, and then you have to emulate the call that you need.
In this case, the page is using ajax to get reviews in paginate way
you have to call:
https://www.imdb.com/title/tt6654210/reviews/_ajax?ref_=undefined&paginationKey=g4wp7dreqyzd4zql7kvh3obyrtum6az4y4hhzo5ziwr26fbyhvrl4ty4o4yvzmjkcrxndtvd7hmf6y6yefcmwoi6hkwovare
the pagination key is provided in the page by the following tag:
<div class="load-more-data" data-key="g4wp7dreqyzd4zql7kvh3obyrtum6az4y4hhzo5ziwr26fbyhvrl4ty4o4yvzmjkcrxndtvd7hmf6y6yefcmwoi6hkwovare" data-ajaxurl="/title/tt6654210/reviews/_ajax">
I hope I have been helpful
Upvotes: 1