Trollermaner
Trollermaner

Reputation: 1

How to scrape data from a script tag in a react web app in python?

I'm trying to scrape data from RateMyProfessor, but since it's a react app and everything on teacher information is dynamically created, that means that requests.get() doesn't get the data I'm trying to parse. But I found that the data is in a script tag that does get parsed from requests.get. I wanted to know how I can retrieve information from

<script> window.__RELAY_STORE__ = {"legacyId":774048,"avgRating":2.6,"numRatings":12} </script>

There's more stuff in the relay store, but this is exactly what I'm trying to parse. Also wanted to add that there are multiple script tags.

I'm currently using Selenium to render the whole page, but it takes a really long time, so is there a way to access this window relay store so that I won't need to render the site each time?

For anyone curious this is what I wrote to get the window relay store

import requests

page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
print(page.content)

Upvotes: 0

Views: 849

Answers (1)

Sin Han Jinn
Sin Han Jinn

Reputation: 684

From inspecting the page, you will notice script is within body. Just extract the script within the body as shown in the code.

import requests
from bs4 import BeautifulSoup
import re

page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
soup = BeautifulSoup(page.text, 'html')
#extract the part you want here
script = soup.find("body").find("script")
#here I'm using regex to just pre process the string 
for items in re.findall(r"(\[.*\])", script.string):
    print(items)

Output gives you: enter image description here

Upvotes: 1

Related Questions