Reputation: 4291
I want to read the page https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh
. Here is my code:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = 'https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh'
res = requests.get(url, headers = headers)
res.encoding = 'utf-8-sig'
soup = BeautifulSoup(res.text, 'lxml')
However, res.text
contains no data of the page.
I also tried:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get(url)
r.html.render()
It says: pyppeteer.errors.NetworkError: Protocol error Target.closeTarget: Target closed.
What should I do?
Upvotes: 0
Views: 229
Reputation: 1533
https://www1.hkexnews.hk/ncms/json/eds/lcisehk1relsdc_1.json
You are welcome.
In case you are curious, the "Network" tab of the DevTools is your friend.
Upvotes: 1
Reputation: 831
Your code is correct. Try another page to load. I ran the script and it works.
import requests
from bs4 import BeautifulSoup # You missed a character 'l'
url = "https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8-sig'
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
els = soup.select("#Callable\ Bull\/Bear\ Contracts")
print(els[0])
I got:
<input checked="" class="filterCheckBox strcProdCheckBox" data-value="Callable Bull/Bear Contracts" id="Callable Bull/Bear Contracts" name="Property" tabindex="-1" type="checkbox"/>
Try to:
curl --header "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Geckoe/50.0.2661.102 Safari/537.36" https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh
Upvotes: 0