H.Okan
H.Okan

Reputation: 193

Web Scraping with interacting webpage

I am working on a project which needs finance data, I need to scrape historical data from yahoo finance,but for example https://finance.yahoo.com/quote/ETH-USD/history?p=ETH-USD in that page, I need to adjust time interval and press download button, how can I do it with python ? I should automate this task.

Sorry for my grammatical mistakes,my native language is not English.

Upvotes: 1

Views: 177

Answers (3)

BushMinusZero
BushMinusZero

Reputation: 1292

You could use a Selenium WebDriver to load the page, WebElement containing the download button and click() it but that would be a slow and brittle solution compared to calling the API directly.

My approach to this problem would be to reverse engineer the Yahoo Finance URL and fetch the data with the Requests library. The result is a CSV with the historical data that you're looking for.

If you look at the download URL... the URL query parameters are fairly intuitive to understand.

https://query1.finance.yahoo.com/v7/finance/download/ETH-USD?period1=1581795382&period2=1613417782&interval=1d&events=history&includeAdjustedClose=true

We can see that the key components to modify are the stock ticker, date range, and interval. In code...

import csv
from datetime import datetime, timedelta
from io import StringIO

import requests


ticker = 'ETH-USD'
url = f'https://query1.finance.yahoo.com/v7/finance/download/{ticker}'
now = datetime.now()
start_ts = int((now - timedelta(days=365)).timestamp())
end_ts = int(now.timestamp())
params = {
    'period1': start_ts,
    'period2': end_ts,
    'interval': '1d',
    'events': 'history',
    'includeAdjustedClose': True,
}

result = requests.get(url, params=params)

f = StringIO(result.content.decode('utf-8'))
reader = csv.reader(f, delimiter=',')
for row in reader:
    print('\t'.join(row))

Upvotes: 0

Mateus
Mateus

Reputation: 266

In order for you to extract the data from yahoo finance, you can use a python library called yfinance

In your case, by using this library you would do this:

import yfinance as yf

tickers = yf.Tickers('ETH')

eth_history = tickers.tickers.ETH.history(period="1y")

And then you would do whatever you want with this data (save in a spreadsheet for example).

Upvotes: 1

decidedlyjeff
decidedlyjeff

Reputation: 186

You can use a library that implements the Chrome DevTools Protocol (CDP) to automate the Chrome browser or a headless Chromium browser (or any browser supporting this protocol).

Here is one library I found by searching: https://github.com/hyperiongray/trio-chrome-devtools-protocol, but I'm sure there are others too. I have not used it personally.

Upvotes: 0

Related Questions