Siddharth Kulkarni
Siddharth Kulkarni

Reputation: 135

How to find a source when a website uses javascript

What I want to achieve

I am trying to scrape the website below using Beautiful-soup and when I load the page it does not give the table that shows various quotes. In my previous posts folks have helped me providing the website that actually fed the main website but I am not sure how did they find it.Once I have pulled the data I can do the rest.

Website

https://www.cmegroup.com/trading/energy/refined-products/methanol-t2-fob-rdam-icis.html

What has been tried.

I tried to use Selenium driver but getting different errors which might need more time and not comfortable using Selenium. Eventually I plan to create an exe that downloads the information to excel file.

Upvotes: 0

Views: 80

Answers (3)

Dan-Dev
Dan-Dev

Reputation: 9430

If you are not comfortable with selenium use PyQt:

"""
Install PyQt on Ubuntu:
    sudo apt-get install python3-pyqt5
    sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
    pip3 install PyQt5
"""

from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView



class Render(QWebEngineView):
    def __init__(self, url):
        self.html = None
        self.app = QApplication(sys.argv)
        QWebEngineView.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self, result):
        self.page().toHtml(self.callable)

    def callable(self, data):
        self.html = data
        self.app.quit()


url = 'https://www.cmegroup.com/trading/energy/refined-products/methanol-t2-fob-rdam-icis.html'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
table = soup.find('table', {'id': 'quotesFuturesProductTable1'})
for tr in table.find_all('tr'):
    print(tr.get_text(" ", strip=True))

Outputs:

Month Charts Last Change Prior Settle Open High Low Volume Hi / Low Limit Updated
NOV 2018 Show Price Chart - - 357.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
DEC 2018 Show Price Chart - - 357.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
JAN 2019 Show Price Chart - - 345.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
FEB 2019 Show Price Chart - - 345.00 - - - 0 No Limit / No Limit 18:01:36 CT 31 Oct 2018
MAR 2019 Show Price Chart - - 342.00 - - - 0 No Limit / No Limit 18:02:29 CT 31 Oct 2018
APR 2019 Show Price Chart - - 339.00 - - - 0 No Limit / No Limit 18:01:47 CT 31 Oct 2018
MAY 2019 Show Price Chart - - 334.00 - - - 0 No Limit / No Limit 18:03:23 CT 31 Oct 2018
JUN 2019 Show Price Chart - - 334.00 - - - 0 No Limit / No Limit 18:01:53 CT 31 Oct 2018
JUL 2019 Show Price Chart - - 337.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
AUG 2019 Show Price Chart - - 337.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
SEP 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
OCT 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
NOV 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
DEC 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018

Some warnings are also sent to standard error.

Upvotes: 0

Arman Ordookhani
Arman Ordookhani

Reputation: 6546

Lots of pages in the web uses JS to change the page. These changes are not visible to Beautiful-soup because it doesn't execute JS. I can think of two options:

  • You could use tools like Selenium that actually runs a full fledged browser with JS.
  • You could open the website in Chrome or Firefox, open web inspector than refresh the page. Watch for XHR requests in network tab, you may find the request that brings the data you are looking for. If you found it you could directly load that page instead of the main page.

Upvotes: 1

Racerdude
Racerdude

Reputation: 74

Is there any way you could run a Python Web Client that actually executes the javascript on the page and then you can scrape the results?

Upvotes: 0

Related Questions