Reputation: 131
I am trying to grab the Income Statement table OF McDonald's Corporation (MCD) "https://finance.yahoo.com/quote/MCD/financials?p=MCD". I used beaufiful soup. The html is downloaded, but there seems no typical "tr", "td" tag for the income statement table. How to convert the income statement table into df dataframe?
my codes:
url="https://finance.yahoo.com/quote/MCD/financials?p=MCD"
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')
print(soup)
array = []
for tr_tag in soup.find_all('tr'):
b_tag = tr_tag.find_all('td')
array.append(b_tag)
print(array)
Upvotes: 1
Views: 137
Reputation: 3155
"Download Income Statement from web page using BeautifulSoup..."
First, you say soup.find_all('tr')
; however, there are no tr
tags in the income statements table. On the website, each row has a div
tag which then has a specific class
. Specifying the class
can really help you tell the program exactly what you want from the website. I used the div class
of "D(tbr) fi-row Bgc($hoverBgColor):h" because it is consistent across each row of the table. You can then use the text
function to get the raw text from the website instead of the HTML
.
url="https://finance.yahoo.com/quote/MCD/financials?p=MCD"
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')
rows = []
for i in soup.find_all('div',{'class':'D(tbr) fi-row Bgc($hoverBgColor):h'}):
row = i.text
rows.append(row)
print(rows)
Upvotes: 1
Reputation: 24940
As mentioned in the comments, here's your step 1:
targets = soup.find("div",{'data-reactid':'41'})
rows = []
for target in targets:
data = target.find_all('span')
row = []
for d in data:
row.append(d.text)
rows.append(row)
for row in rows:
print(row)
output:
['Total Revenue', '21,076,500', '21,025,200', '22,820,400', '24,621,900']
['Cost of Revenue', '9,961,200', '10,239,200', '12,199,600', '14,417,200']
['Gross Profit', '11,115,300', '10,786,000', '10,620,800', '10,204,700']
['Operating Expenses', 'Research Development', 'Selling General and Administrative', '2,229,400', '2,200,200', '2,231,300', '2,384,500', 'Total Operating Expenses', '2,045,500', '2,200,200', '2,231,300', '2,384,500']
etc.
Upvotes: 1