saga
saga

Reputation: 755

How to grab quarterly and specific the date of yahoo financial data with python?

I can download the annual data from this link by the following code, but it's not the same as what's shown on the website because it's the data of June:

enter image description here

Now I have two questions:

  1. How do I specific the date so the annual data is the same as the following picture(September instead of June as shown in red rectangle)?
  2. By clicking quarterly as shown in orange rectangle, the link won't be changed. How do I grab the quarterly data?

Thanks.

enter image description here

Upvotes: 4

Views: 4478

Answers (1)

chitown88
chitown88

Reputation: 28565

Just curious, but why write the html to file first and then read it with pandas? Pandas can take in the html request directly:

import pandas as pd

symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)

dfs = pd.read_html(url)   
print(dfs[0])

Secondly, not sure why yours is popping up with the yearly dates. Doing the way as I have it above is showing September.

print(dfs[0])
                                         0  ...                                  4
0                                  Revenue  ...                          9/26/2015
1                            Total Revenue  ...                          233715000
2                          Cost of Revenue  ...                          140089000
3                             Gross Profit  ...                           93626000
4                       Operating Expenses  ...                 Operating Expenses
5                     Research Development  ...                            8067000
6       Selling General and Administrative  ...                           14329000
7                            Non Recurring  ...                                  -
8                                   Others  ...                                  -
9                 Total Operating Expenses  ...                          162485000
10                Operating Income or Loss  ...                           71230000
11       Income from Continuing Operations  ...  Income from Continuing Operations
12         Total Other Income/Expenses Net  ...                            1285000
13      Earnings Before Interest and Taxes  ...                           71230000
14                        Interest Expense  ...                            -733000
15                       Income Before Tax  ...                           72515000
16                      Income Tax Expense  ...                           19121000
17                       Minority Interest  ...                                  -
18          Net Income From Continuing Ops  ...                           53394000
19                    Non-recurring Events  ...               Non-recurring Events
20                 Discontinued Operations  ...                                  -
21                     Extraordinary Items  ...                                  -
22            Effect Of Accounting Changes  ...                                  -
23                             Other Items  ...                                  -
24                              Net Income  ...                         Net Income
25                              Net Income  ...                           53394000
26   Preferred Stock And Other Adjustments  ...                                  -
27  Net Income Applicable To Common Shares  ...                           53394000

[28 rows x 5 columns]

For the second part, you could try to find the data 1 of a few ways:

1) Check the XHR requests and get the data you want by including parameters to the request url that generates that data and can return to you in json format (which when I looked for, I could not find right off the bat, so moved on to the next option)

2) Search through the <script> tags, as the json format can sometimes be within those tags (which I didn't search through very thoroughly, and think Selenium would just be a direct way since pandas can read in the tables)

3) Use selenium to simulate opening the browser, getting the table, and clicking on "Quarterly", then getting that table

I went with option 3:

from selenium import webdriver
import pandas as pd

symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)

# Get Table shown in browser
dfs_annual = pd.read_html(driver.page_source)   
print(dfs_annual[0])

# Click "Quarterly"
driver.find_element_by_xpath("//span[text()='Quarterly']").click()

# Get Table shown in browser
dfs_quarter = pd.read_html(driver.page_source)   
print(dfs_quarter[0])

driver.close()

Upvotes: 3

Related Questions