Reputation: 1
I'm trying to scrape data from RateMyProfessor, but since it's a react app and everything on teacher information is dynamically created, that means that requests.get() doesn't get the data I'm trying to parse. But I found that the data is in a script tag that does get parsed from requests.get. I wanted to know how I can retrieve information from
<script> window.__RELAY_STORE__ = {"legacyId":774048,"avgRating":2.6,"numRatings":12} </script>
There's more stuff in the relay store, but this is exactly what I'm trying to parse. Also wanted to add that there are multiple script tags.
I'm currently using Selenium to render the whole page, but it takes a really long time, so is there a way to access this window relay store so that I won't need to render the site each time?
For anyone curious this is what I wrote to get the window relay store
import requests
page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
print(page.content)
Upvotes: 0
Views: 849
Reputation: 684
From inspecting the page, you will notice script is within body. Just extract the script within the body as shown in the code.
import requests
from bs4 import BeautifulSoup
import re
page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
soup = BeautifulSoup(page.text, 'html')
#extract the part you want here
script = soup.find("body").find("script")
#here I'm using regex to just pre process the string
for items in re.findall(r"(\[.*\])", script.string):
print(items)
Upvotes: 1