Reputation: 183
This site has data on stock and I'm trying to sub struct some data from this site. https://quickfs.net/company/AAPL:US
Where AAPL is a stock name and can be changed.
the page looks like a big table : the columns are years and the rows are calculated values like: Return on Assets and Gross Margin
For this I tried to follow few tutorials:
Introduction to Web Scraping (Python) - Lesson 02 (Scrape Tables)
Intro to Web Scraping with Python and Beautiful Soup
Web Scraping HTML Tables with Python
Web scraping with Python — A to Z Part A — Handling BeautifulSoup and avoiding blocks
I get stuck right at the beginning after importing the packages:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
this function to retrive the data from the web page:
def make_soup(url):
thepage=uReq(url)
soupdata=soup(thepage, "html.parser")
return(soupdata)
then
soup=make_soup("https://quickfs.net/company/AAPL:US")
Now, when trying to look what data inside the soup
soup.text
The output is just this and not all the data from the webpage:
'\n\n\n\n\n\n\n\n\n\n\n\nExport Fundamental Data U.S. and International Stocks - QuickFS.net\n\n\n\n\n\n \r\n Loading QuickFS...\r\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n'
I think it's a problem with the specific web page but I have no idea how to handle with this.
Entering different url the the function make_soup(url) sometimes do work.
Pleas your kind help
Upvotes: 0
Views: 357
Reputation: 61
That is because that page is fully dynamic, meaning that javascript is doing all the work and BeautifulSoup4 doesn't run JS.
You have to choices here:
In the case of B, you would see that the site is making this call:
curl 'https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/' \
-XGET \
-H 'Accept: application/json, text/plain, */*' \
-H 'Content-Type: application/json' \
-H 'Origin: https://quickfs.net' \
-H 'Accept-Language: en-us' \
-H 'Host: api.quickfs.net' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15' \
-H 'Referer: https://quickfs.net/company/AAPL:US' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Connection: keep-alive' \
-H 'X-Auth-Token: ' \
-H 'X-Referral-Code: '
What you can do is this instead:
import requests
response = request.get("https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/")
data = response.json()
Where data will be the raw data that the site uses to present the info:
{
"datasets": {
"metadata": {
"_id": {},
"qfs_symbol": "NAS:AAPL",
"currency": "USD",
"fsCat": "normal",
"name": "Apple Inc.",
"gs3_version_at_metadata_update": 20191106,
"exchange": "NASDAQ",
"industry": "Technology Hardware & Equipment",
"symbol": "AAPL",
"country": "US",
"price": 278.58,
...
}
}
Upvotes: 0