Pavol Bujna
Pavol Bujna

Reputation: 179

BeautifulSoup is missing content

I'm trying to scrap a number of visitors to my local climbing centre.

import requests
from bs4 import BeautifulSoup
page = requests.get("https://portal.rockgympro.com/portal/public/c3b9019203e4bc4404983507dbdf2359/occupancy?&iframeid=occupancyCounter&fId=1644")
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('span', id="count")
print(results)

It's printing this:

<span id="count" style="display:inline"></span>

That's nice, but the number 19 is missing... What am I doing wrong?

Inspect

Upvotes: 0

Views: 820

Answers (4)

chitown88
chitown88

Reputation: 28595

It's there in json format in the tag of the html. Just need to pull it out.

import requests
import json
from bs4 import BeautifulSoup

url = 'https://portal.rockgympro.com/portal/public/c3b9019203e4bc4404983507dbdf2359/occupancy?&iframeid=occupancyCounter&fId=1644'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
scriptStr = str(soup.find_all('script')[2]).split('var data = ')[-1].split(';')[0].replace("'",'"')
last_char_index = scriptStr.rfind(",")
scriptStr = scriptStr[:last_char_index] + '}'
scriptStr = scriptStr.replace('&nbsp', ' ')

jsonData = json.loads(scriptStr)



count = jsonData['REA']['count']
capacity = jsonData['REA']['capacity']
lastUpdate = jsonData['REA']['lastUpdate']

print(f'{count} of {capacity} Climbers\n{lastUpdate}')

Output:

58 of 220 Climbers
Last updated: now  (5:20 PM)

Upvotes: 3

Ribson Cyber
Ribson Cyber

Reputation: 11

You can try requests_html module to get dynamic values which are calculated by javascript. I tried with below logic it worked for me on your site.

from bs4 import BeautifulSoup
import time
from requests_html import HTMLSession

url="Your Site Link"

# create an HTML Session object
session = HTMLSession()

# Use the object above to connect to needed webpage
resp = session.get(url)

# Run JavaScript code on webpage
resp.html.render(sleep=10)


soup = BeautifulSoup(resp.html.html, 'lxml')
results = soup.find('span', id="count")
print(results)

Your Site calculate Result

Upvotes: 1

G.S
G.S

Reputation: 565

In the dev tools under one of the tags, you can see that many of those figures are generated after the page load by the JavaScript function showGym(). In order to allow those figures to generate you could use a browser driver tool like webbot or Selenium which can wait on pages long enough for the javascript to execute populate those fields. It might be possible to have requests do that, but I don't know as I've only used webbot when reaching problems like these as it's very easy to use.

Upvotes: 0

vtasca
vtasca

Reputation: 1770

You're not doing anything wrong, the issue is that the website is populating the <span> element using JavaScript, which runs after your request is made.

Unfortunately, the requests library cannot run JavaScript since it is a pure HTTP tool. I would recommend checking out something like Selenium which is more robust and can wait for the JavaScript to load before scraping the HTML.

Upvotes: 1

Related Questions