padulla
padulla

Reputation: 11

Web-scraping a javascript table with python BueatifulSoup

I can't get one javascript table with BueatifulSoup, returning empty array

I tried to get data from this page. https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Historical-Daily?sc_lang=en#select4=1&select5=2&select3=0&select2=3&select1=24

import requests, json
text = requests.get("https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Historical-Daily?sc_lang=en#select4=0&select5=2&select3=0&select2=3&select1=24")
data = json.loads(text)

print(data['Scty'])

Upvotes: 0

Views: 98

Answers (1)

QHarr
QHarr

Reputation: 84475

There is another url you can use - found by looking at the network tab. A little string manipulation on the response text and you have a string that can be loaded with json and contains everything on the page (including for all 4 drop down geographies). There is no need for bs4. You can extract everything you want with json library.

Explore it here.

import requests
import json

r = requests.get('https://www.hkex.com.hk/eng/csm/DailyStat/data_tab_daily_20190425e.js?_=1556252093686')
data = json.loads(r.text.replace('tabData = ',''))

For example, path to first row of table on landing page:

enter image description here

Upvotes: 1

Related Questions