Reputation: 75
I need to run a script that scrapes the following site daily (when the script is run it scrapes the calendar for that day) (the equivalent of clicking on the "daily" button)
http://www.fxempire.com/economic-calendar/
I want to extract all the days data/events for that particular day, and filter for the relevant currencies (if appropriate) and to then create some kind of alert or pop up 10 mins before each of those events are to take place.
I am using the below code so far to scrape the webpage, and then view/print the variable "html" but cannot find the calendar information that I require.
import sys
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'http://www.fxempire.com/economic-calendar/'
r = Render(url)
html = r.frame.toHtml()
Upvotes: 4
Views: 8656
Reputation: 7821
In my opinion, the best way to scrape data from web pages is to use BeautifulSoup. Here is a quick script that'll get the data you want.
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup
# Get a file-like object using urllib2.urlopen
url = 'http://ecal.forexpros.com/e_cal.php?duration=daily'
html = urlopen(url)
# BS accepts a lot of different data types, so you don't have to do e.g.
# urlopen(url).read(). It accepts file-like objects, so we'll just send in html
# as a parameter.
soup = BeautifulSoup(html)
# Loop over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr'
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}):
# Find the event, currency and actual price by looking up <td> elements
# with class names.
event = tr.find('td', {'class': 'ec_td_event'}).text
currency = tr.find('td', {'class': 'ec_td_currency'}).text
actual = tr.find('td', {'class': 'ec_td_actual'}).text
# The returned strings which are returned are unicode, so to print them,
# we need to use a unicode string.
print u'{:3}\t{:6}\t{}'.format(currency, actual, event)
To give you some hints of how to solve a problem like this in the future, I've written down the steps I used when solving your problem. Hope it helps.
Inspect Element
. iframe
with the info in by looking in the elements tab, and opened that url.<tr>
elements, and had the class ec_bg1_tr
or ec_bg2_tr
.tr
elements with class ec_bg1_tr
by using soup.find_all('tr', {'class': 'ec_bg1_tr'})
. My initial though was to first loop over these elements, and then loop over the ec_bg2_tr
elements. Upvotes: 3