Reputation: 19
I'm trying to pull information off of this web page (Which is providing an AJAX call to this page).
I'm able to print out the whole page, but the find_all function just returns a blank list. What am I doing wrong?
from bs4 import BeautifulSoup
import requests
url = "http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1653673850875&t=XNAS:AAPL®ion=usa&culture=en-US&cur=&order=asc&_=1653673850919"
def pageText():
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
return doc
specialNum = pageText()
print(specialNum)
specialNum = pageText().find_all('literally anything I am trying to pull off of the page')
print(specialNum) #This will always print a blank list
Apologies if this is a stupid question. I'm a bit of a beginner.
Upvotes: 1
Views: 72
Reputation: 25048
as mentioned by @furas removing parameter and value callback=jsonp1653673850875
from url server will send pure JSON and you can get HTML directly via r.json()['componentData']
.
Simplest approach in my opinion is to unwrap the JSON string and convert it with json.loads()
to access the HTML.
From there you can go with beautifulsoup
or pandas
to scrape the content.
import json, requests
from bs4 import BeautifulSoup
r = requests.get('http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1653673850875&t=XNAS:AAPL®ion=usa&culture=en-US&cur=&order=asc&_=1653673850919')
soup = BeautifulSoup(
json.loads(
r.text.split('(',1)[-1].rsplit(')',1)[0]
)['componentData']
)
for row in soup.select('table tr'):
...
import json, requests
import pandas as pd
r = requests.get('http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1653673850875&t=XNAS:AAPL®ion=usa&culture=en-US&cur=&order=asc&_=1653673850919')
pd.read_html(json.loads(
r.text.split('(',1)[-1].rsplit(')',1)[0]
)['componentData']
)[0].dropna()
Unnamed: 0 | 2012-09 | 2013-09 | 2014-09 | 2015-09 | 2016-09 | 2017-09 | 2018-09 | 2019-09 | 2020-09 | 2021-09 | TTM |
---|---|---|---|---|---|---|---|---|---|---|---|
Revenue USD Mil | 156508 | 170910 | 182795 | 233715 | 215639 | 229234 | 265595 | 260174 | 274515 | 365817 | 386017 |
Gross Margin % | 43.9 | 37.6 | 38.6 | 40.1 | 39.1 | 38.5 | 38.3 | 37.8 | 38.2 | 41.8 | 43.3 |
Operating Income USD Mil | 55241 | 48999 | 52503 | 71230 | 60024 | 61344 | 70898 | 63930 | 66288 | 108949 | 119379 |
Operating Margin % | 35.3 | 28.7 | 28.7 | 30.5 | 27.8 | 26.8 | 26.7 | 24.6 | 24.1 | 29.8 | 30.9 |
Net Income USD Mil | 41733 | 37037 | 39510 | 53394 | 45687 | 48351 | 59531 | 55256 | 57411 | 94680 | 101935 |
Earnings Per Share USD | 1.58 | 1.42 | 1.61 | 2.31 | 2.08 | 2.3 | 2.98 | 2.97 | 3.28 | 5.61 | 6.15 |
Dividends USD | 0.09 | 0.41 | 0.45 | 0.49 | 0.55 | 0.6 | 0.68 | 0.75 | 0.8 | 0.85 | 0.88 |
Payout Ratio % * | — | 27.4 | 28.5 | 22.3 | 24.8 | 26.5 | 23.7 | 25.1 | 23.7 | 16.3 | 14.3 |
Shares Mil | 26470 | 26087 | 24491 | 23172 | 22001 | 21007 | 20000 | 18596 | 17528 | 16865 | 16585 |
Book Value Per Share * USD | 4.25 | 4.9 | 5.15 | 5.63 | 5.93 | 6.46 | 6.04 | 5.43 | 4.26 | 3.91 | 4.16 |
Operating Cash Flow USD Mil | 50856 | 53666 | 59713 | 81266 | 65824 | 63598 | 77434 | 69391 | 80674 | 104038 | 116426 |
Cap Spending USD Mil | -9402 | -9076 | -9813 | -11488 | -13548 | -12795 | -13313 | -10495 | -7309 | -11085 | -10633 |
Free Cash Flow USD Mil | 41454 | 44590 | 49900 | 69778 | 52276 | 50803 | 64121 | 58896 | 73365 | 92953 | 105793 |
Free Cash Flow Per Share * USD | 1.58 | 1.61 | 1.93 | 2.96 | 2.24 | 2.41 | 2.88 | 3.07 | 4.04 | 5.57 | — |
Working Capital USD Mil | 19111 | 29628 | 5083 | 8768 | 27863 | 27831 | 14473 | 57101 | 38321 | 9355 | — |
Upvotes: 2