Reputation: 755
I can download the annual data from this link by the following code, but it's not the same as what's shown on the website because it's the data of June:
Now I have two questions:
Thanks.
Upvotes: 4
Views: 4478
Reputation: 28565
Just curious, but why write the html to file first and then read it with pandas? Pandas can take in the html request directly:
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
dfs = pd.read_html(url)
print(dfs[0])
Secondly, not sure why yours is popping up with the yearly dates. Doing the way as I have it above is showing September.
print(dfs[0])
0 ... 4
0 Revenue ... 9/26/2015
1 Total Revenue ... 233715000
2 Cost of Revenue ... 140089000
3 Gross Profit ... 93626000
4 Operating Expenses ... Operating Expenses
5 Research Development ... 8067000
6 Selling General and Administrative ... 14329000
7 Non Recurring ... -
8 Others ... -
9 Total Operating Expenses ... 162485000
10 Operating Income or Loss ... 71230000
11 Income from Continuing Operations ... Income from Continuing Operations
12 Total Other Income/Expenses Net ... 1285000
13 Earnings Before Interest and Taxes ... 71230000
14 Interest Expense ... -733000
15 Income Before Tax ... 72515000
16 Income Tax Expense ... 19121000
17 Minority Interest ... -
18 Net Income From Continuing Ops ... 53394000
19 Non-recurring Events ... Non-recurring Events
20 Discontinued Operations ... -
21 Extraordinary Items ... -
22 Effect Of Accounting Changes ... -
23 Other Items ... -
24 Net Income ... Net Income
25 Net Income ... 53394000
26 Preferred Stock And Other Adjustments ... -
27 Net Income Applicable To Common Shares ... 53394000
[28 rows x 5 columns]
For the second part, you could try to find the data 1 of a few ways:
1) Check the XHR requests and get the data you want by including parameters to the request url that generates that data and can return to you in json format (which when I looked for, I could not find right off the bat, so moved on to the next option)
2) Search through the <script>
tags, as the json format can sometimes be within those tags (which I didn't search through very thoroughly, and think Selenium would just be a direct way since pandas can read in the tables)
3) Use selenium to simulate opening the browser, getting the table, and clicking on "Quarterly", then getting that table
I went with option 3:
from selenium import webdriver
import pandas as pd
symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
# Get Table shown in browser
dfs_annual = pd.read_html(driver.page_source)
print(dfs_annual[0])
# Click "Quarterly"
driver.find_element_by_xpath("//span[text()='Quarterly']").click()
# Get Table shown in browser
dfs_quarter = pd.read_html(driver.page_source)
print(dfs_quarter[0])
driver.close()
Upvotes: 3