Reputation: 31
I am attempting to request the price data from dukascopy.com but I am running into a similar problem to this user, where the price data itself is not a part of the html. Therefore, when I run my basic urllib code to extract the data:
import urllib.request
url = 'https://www.dukascopy.com'
headers = {'User-Agent':'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'}
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
print(str(respData))
the price data cannot be found. Referring back to this post, the user Mark found another url that the data was called from. Can this be applied to collect the data here as well?
Upvotes: 3
Views: 227
Reputation: 17064
Try with dryscape
. You can scrape JavaScript rendered pages with it. Don't parse web pages with regex module. It's not a good idea. Read this why you should not parse HTML pages with regex: HTML with regex. Use Beautiful for parsing.
import dryscrape
from bs4 import BeautifulSoup
url = 'https://www.dukascopy.com'
session = dryscrape.Session()
session.visit(url)
response = session.body()
soup=BeautifulSoup(response)
print soup
Upvotes: 1