Reputation: 396
I'm trying to scrape past 5 years yahoo finance historical data for a particular stock. I have implemented a python code that is scraping each row of the table containing historical data. I know there are simpler ways to fetch historical data but I want to do it with scraping. The problem is yahoo finance has infinte scrolling implmented in it i.e. as soon as I reach the end of the website more rows are getting added dynamically to the table. But my code is fetching rows till the end of first page only and not the complete 5 years data. Here is the sample of the code that I'm trying:
After navigating to the rows during scraping part-
tableRows = table.find_all('tr', class_='BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)')
I'm further extracting values from these rows
Upvotes: 0
Views: 1942
Reputation: 415
A lot better solutions have been shown, but I'm just showing you how it can be done with pressing "END" key
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.implicitly_wait(6)
driver.get("https://uk.finance.yahoo.com/quote/RELIANCE.NS/history?period1=1297987200&period2=1613606400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
driver.find_element_by_xpath('//*[@id="consent-page"]/div/div/div/form/div[2]/div[2]/button').click()
history_table = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody').find_elements_by_tag_name("tr")
# while year >= 2020 - 5
while(int(history_table[-1].find_elements_by_tag_name("td")[0].text.split()[2]) >= 2020-5):
history_table = driver.find_element_by_xpath(
'//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody').find_elements_by_tag_name("tr")
action = ActionChains(driver)
action.send_keys(Keys.END).perform()
Upvotes: 0
Reputation: 28565
Selenium is one way to do it. More efficient way is to query the data directly:
import requests
import pandas as pd
import datetime
years = 5
dt= datetime.datetime.now()
past_date = datetime.datetime(year=dt.year-years, month=dt.month, day=dt.day)
url = 'https://query2.finance.yahoo.com/v8/finance/chart/RELIANCE.NS'
headers= {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
payload = {
'formatted': 'true',
'crumb': 'J2oUJNHQwXU',
'lang': 'en-GB',
'region': 'GB',
'includeAdjustedClose': 'true',
'interval': '1d',
'period1': '%s' %int(past_date.timestamp()),
'period2': '%s' %int(dt.timestamp()),
'events': 'div|split',
'useYfid': 'true',
'corsDomain': 'uk.finance.yahoo.com'}
jsonData = requests.get(url, headers=headers, params=payload).json()
result = jsonData['chart']['result'][0]
indicators = result['indicators']
rows = {'timestamp':result['timestamp']}
rows.update(indicators['adjclose'][0])
rows.update(indicators['quote'][0])
df = pd.DataFrame(rows)
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
Output:
print(df)
timestamp adjclose ... open low
0 2016-03-08 03:45:00 492.139252 ... 499.019806 499.019806
1 2016-03-09 03:45:00 499.183502 ... 505.211090 504.517670
2 2016-03-10 03:45:00 484.831451 ... 516.132568 499.762756
3 2016-03-11 03:45:00 486.149292 ... 502.685059 500.555237
4 2016-03-14 03:45:00 488.665009 ... 504.765320 501.719208
... ... ... ... ...
1229 2021-03-01 03:45:00 2101.699951 ... 2110.199951 2062.500000
1230 2021-03-02 03:45:00 2106.000000 ... 2122.000000 2089.100098
1231 2021-03-03 03:45:00 2202.100098 ... 2121.050049 2107.199951
1232 2021-03-04 03:45:00 2175.850098 ... 2180.000000 2157.699951
1233 2021-03-05 09:59:59 2178.699951 ... 2156.000000 2153.050049
[1234 rows x 7 columns]
Upvotes: 1
Reputation: 675
I suggest you try the yfinance library (https://pypi.org/project/yfinance/)
import yfinance as yf
msft = yf.Ticker("MSFT")
# get stock info
msft.info
# get historical market data
hist = msft.history(period="max")
Upvotes: 2
Reputation: 784
You need to imitate user behavior inside the browser in order to fetch the rest of the results.
Upvotes: 1