L. Chen
L. Chen

Reputation: 31

How to use urllib and re to retrieve live price data with Python

I am attempting to request the price data from dukascopy.com but I am running into a similar problem to this user, where the price data itself is not a part of the html. Therefore, when I run my basic urllib code to extract the data:

import urllib.request
url = 'https://www.dukascopy.com'
headers = {'User-Agent':'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'}
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
print(str(respData))

the price data cannot be found. Referring back to this post, the user Mark found another url that the data was called from. Can this be applied to collect the data here as well?

Upvotes: 3

Views: 227

Answers (1)

Mohammad Yusuf
Mohammad Yusuf

Reputation: 17064

Try with dryscape. You can scrape JavaScript rendered pages with it. Don't parse web pages with regex module. It's not a good idea. Read this why you should not parse HTML pages with regex: HTML with regex. Use Beautiful for parsing.

import dryscrape
from bs4 import BeautifulSoup

url = 'https://www.dukascopy.com'
session = dryscrape.Session()
session.visit(url)
response = session.body()
soup=BeautifulSoup(response)
print soup

Upvotes: 1

Related Questions