JEB
JEB

Reputation: 5

Python requests get stuck when trying to get web content

I want to get the prices from this instrument on this webpage: http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500

Normally the requests.get does the trick, but for this one the script gets stuck. I've tried a user-agent according to this answer How to use Python requests to fake a browser visit a.k.a and generate User Agent?

but no luck. My code

import requests

url = "http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)

Upvotes: 0

Views: 912

Answers (2)

RobinFrcd
RobinFrcd

Reputation: 5476

The User-Agent you're using is very old (at least 8 years old), and may be blocked by very basic protections.

If you switch to a very common User-Agent like 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36' it works fine.

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}

response = requests.get(
    'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500', 
    headers=headers
)
response.status_code
# 200

And if you need to get the real data, you'll need to fetch it from another URL (you can find it with your browser inspector):

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}

response = requests.get(
    'http://www.nasdaqomxnordic.com/webproxy/DataFeedProxy.aspx?SubSystem=History&Action=GetChartData&inst.an=id%2Cnm%2Cfnm%2Cisin%2Ctp%2Cchp%2Cycp&FromDate=2022-05-19&ToDate=2022-08-19&json=true&timezone=CET&showAdjusted=false&app=%2Fetp%2Fetf%2Fetfhistorical-HistoryChart&Instrument=SSE500', 
    headers=headers
)
response.json()

Upvotes: 1

Pepe Salad
Pepe Salad

Reputation: 223

It looks like that site (the data on its charts) is loaded dynamically using Javascript, so requests won't return a useable result. You can use Selenium to simulate an actual browser instance which will run the Javascript needed for grabbing data off the page.

You'll need:

Usage example:

from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.FirefoxOptions()
# options.headless = True # This is normally the first google search after people find Selenium.
driver = webdriver.Firefox(options=options)

# Grabbing a URL using the browser instance.
driver.get("URL")

# Finding an element by ID
example_element = driver.find_element(By.ID, "Element ID")
print(example_element.text)

# Closing the browser instance
driver.quit()

It'll take some messing around to figure out how to utilize all of Selenium's capabilities in your code, but there's a lot of documentation (https://selenium-python.readthedocs.io) out there for figuring it all out.

Upvotes: 1

Related Questions