Reputation: 135
I am trying to scrape the website below using Beautiful-soup and when I load the page it does not give the table that shows various quotes. In my previous posts folks have helped me providing the website that actually fed the main website but I am not sure how did they find it.Once I have pulled the data I can do the rest.
https://www.cmegroup.com/trading/energy/refined-products/methanol-t2-fob-rdam-icis.html
I tried to use Selenium driver but getting different errors which might need more time and not comfortable using Selenium. Eventually I plan to create an exe that downloads the information to excel file.
Upvotes: 0
Views: 80
Reputation: 9430
If you are not comfortable with selenium use PyQt:
"""
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
"""
from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = 'https://www.cmegroup.com/trading/energy/refined-products/methanol-t2-fob-rdam-icis.html'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
table = soup.find('table', {'id': 'quotesFuturesProductTable1'})
for tr in table.find_all('tr'):
print(tr.get_text(" ", strip=True))
Outputs:
Month Charts Last Change Prior Settle Open High Low Volume Hi / Low Limit Updated
NOV 2018 Show Price Chart - - 357.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
DEC 2018 Show Price Chart - - 357.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
JAN 2019 Show Price Chart - - 345.00 - - - 0 No Limit / No Limit 18:01:39 CT 31 Oct 2018
FEB 2019 Show Price Chart - - 345.00 - - - 0 No Limit / No Limit 18:01:36 CT 31 Oct 2018
MAR 2019 Show Price Chart - - 342.00 - - - 0 No Limit / No Limit 18:02:29 CT 31 Oct 2018
APR 2019 Show Price Chart - - 339.00 - - - 0 No Limit / No Limit 18:01:47 CT 31 Oct 2018
MAY 2019 Show Price Chart - - 334.00 - - - 0 No Limit / No Limit 18:03:23 CT 31 Oct 2018
JUN 2019 Show Price Chart - - 334.00 - - - 0 No Limit / No Limit 18:01:53 CT 31 Oct 2018
JUL 2019 Show Price Chart - - 337.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
AUG 2019 Show Price Chart - - 337.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
SEP 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
OCT 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
NOV 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
DEC 2019 Show Price Chart - - 335.00 - - - 0 No Limit / No Limit 16:45:00 CT 31 Oct 2018
Some warnings are also sent to standard error.
Upvotes: 0
Reputation: 6546
Lots of pages in the web uses JS to change the page. These changes are not visible to Beautiful-soup because it doesn't execute JS. I can think of two options:
Upvotes: 1
Reputation: 74
Is there any way you could run a Python Web Client that actually executes the javascript on the page and then you can scrape the results?
Upvotes: 0