Reputation: 5
I want to get the prices from this instrument on this webpage: http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500
Normally the requests.get
does the trick, but for this one the script gets stuck. I've tried a user-agent according to this answer How to use Python requests to fake a browser visit a.k.a and generate User Agent?
but no luck. My code
import requests
url = "http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}
response = requests.get(url, headers=headers)
Upvotes: 0
Views: 912
Reputation: 5476
The User-Agent you're using is very old (at least 8 years old), and may be blocked by very basic protections.
If you switch to a very common User-Agent like 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
it works fine.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
response = requests.get(
'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500',
headers=headers
)
response.status_code
# 200
And if you need to get the real data, you'll need to fetch it from another URL (you can find it with your browser inspector):
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
response = requests.get(
'http://www.nasdaqomxnordic.com/webproxy/DataFeedProxy.aspx?SubSystem=History&Action=GetChartData&inst.an=id%2Cnm%2Cfnm%2Cisin%2Ctp%2Cchp%2Cycp&FromDate=2022-05-19&ToDate=2022-08-19&json=true&timezone=CET&showAdjusted=false&app=%2Fetp%2Fetf%2Fetfhistorical-HistoryChart&Instrument=SSE500',
headers=headers
)
response.json()
Upvotes: 1
Reputation: 223
It looks like that site (the data on its charts) is loaded dynamically using Javascript, so requests
won't return a useable result. You can use Selenium to simulate an actual browser instance which will run the Javascript needed for grabbing data off the page.
You'll need:
pip install selenium
Usage example:
from selenium import webdriver
from selenium.webdriver.common.by import By
options = webdriver.FirefoxOptions()
# options.headless = True # This is normally the first google search after people find Selenium.
driver = webdriver.Firefox(options=options)
# Grabbing a URL using the browser instance.
driver.get("URL")
# Finding an element by ID
example_element = driver.find_element(By.ID, "Element ID")
print(example_element.text)
# Closing the browser instance
driver.quit()
It'll take some messing around to figure out how to utilize all of Selenium's capabilities in your code, but there's a lot of documentation (https://selenium-python.readthedocs.io) out there for figuring it all out.
Upvotes: 1