GalleyWest
GalleyWest

Reputation: 59

How To Delay Python Requests Library To Allow Data To Populate

I am attempting to get the data from a webpage that is using .aspx. I am able to get all of my data but one value, because it appears it takes a few moments to load after the HTML loads.

My code currently looks like this:

import requests # Requesting HTML
import bs4 as bs # Parsing HTML
url_two = "https://www.walottery.com/Scratch/Explorer.aspx?id=1463"   
r_two = requests.get(url_two)
soup = bs.BeautifulSoup(r_two.text, "lxml")
print(soup.find("strong", {"class": "ticket-explorer-detail-info-printed"}))

However, when I print the value is <strong class="ticket-explorer-detail-info-printed">N/A</strong>.

If you "inspect element" on the webpage, you can see the data change from what I pasted above to this: <strong class="ticket-explorer-detail-info-printed">2,428,400</strong>.

How can I cause a slight delay so that my requests library allows me to get the calculated value, rather than "N/A"?

Upvotes: 0

Views: 3958

Answers (2)

Dan-Dev
Dan-Dev

Reputation: 9430

The web-page is dynamically generated from JSON embedded in a script element in the HTML. You can extract the JSON and parse it to get the data you want or use Selenium to render the JavaScript on the page. To extract the JSON:

import requests
import json
from bs4 import BeautifulSoup

url = 'https://www.walottery.com/Scratch/Explorer.aspx?id=1463'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
# Find the script element contaning th JSON the web-page is dynamically generated from.
anchor = "WaLottery.Scratch.data = "
s = soup.find(lambda tag:tag.name=="script" and anchor in tag.text)
# Extract the JSON.
j = s.text[s.text.find("parse")+7:s.text.find("'),")]
# Load the JSON.
d = json.loads(j)
# Read the data from the JSON.
for game in d['Games']:
    print ( game['Id'], game['TicketsPrinted'])

Outputs:

1503 3,232,300
1497 2,427,400
1496 2,585,600
1493 3,467,000
1491 2,169,350
1490 2,194,350
1489 3,862,600
1488 4,832,950
1486 1,801,975
1483 2,422,200
1482 2,410,200
1481 2,450,400
1480 1,802,100
1479 1,320,300
1478 1,822,000
1476 5,236,000
1475 3,496,200
1474 3,155,000
1473 2,127,300
1472 1,112,265
1470 2,350,250
1469 3,120,050
1468 955,800
1467 2,161,550
1466 1,339,400
1465 556,000
1464 2,213,350
1463 2,428,400
1462 2,419,600
1461 2,434,600
1460 2,591,900
1459 3,887,000
1458 3,468,500
1457 2,180,300
1456 2,110,100
1455 2,089,200
1454 543,235
1453 2,421,600
1452 2,418,200
1451 2,400,800
1450 3,127,050
1449 2,167,400
1448 2,379,950
1446 4,838,700
1445 1,233,550
1444 2,456,550
1442 1,770,425
1441 3,838,700
1440 13,647,500
1439 3,255,400
1433 2,859,400
1431 3,158,450
1422 3,332,500
1415 5,192,000
1410 1,836,575
1409 3,567,270
1405 2,409,500
1391 2,162,100
1379 2,467,725
1373 3,645,075

The one you were looking at was:

 1463 2,428,400

Upvotes: 2

xrisk
xrisk

Reputation: 3898

All the data you need is embedded inside the HTML inside a script tag. Read the contents of that script tag using BeautifulSoup and parse the JSON. You can find the tickets sold inside that JSON object.

Upvotes: 1

Related Questions