Kush G
Kush G

Reputation: 31

How do you webscrape data from a "span" tag with a "data-reactid" using beautifulsoup in python?

I am trying to extract real time price data of stocks from Yahoo Finance. This information is contain in a "span" tag with a "class" and "data-reactid". I am unable to extract the information out of this span tag.

When I enter my code, I don't get any output nor do I get any errors.

I have tried almost all the other answers to this question, but none have worked for me.

<--HTML Code-->
<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="34">197.00</span>
#Python Script
my_url = "https://finance.yahoo.com/quote/AAPL?p=AAPL&.tsrc=fin-srch"
u_client = u_req(my_url)

page_html = u_client.read()
u_client.close()

page_soup = soup(page_html, "html.parser")
container = page_soup.find('span', {"data-reactid":'34'})

I would like to get the output of "197.00" (real time price of the stock) as the output.

Upvotes: 3

Views: 1667

Answers (4)

SIM
SIM

Reputation: 22440

You can fetch that in number of ways. Here is one of them:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://finance.yahoo.com/quote/AAPL')
soup = BeautifulSoup(res.text, 'lxml')
price = soup.select_one('#quote-market-notice').find_all_previous()[2].text
print(price)

Another way:

price = soup.select_one("[class*='smartphone_Mt'] span").text
print(price)

Upvotes: 3

Adam Williamson
Adam Williamson

Reputation: 295

I opened the URL in chrome and pressed F12. Clicking on the network tab revealed this query from the page: https://query1.finance.yahoo.com/v8/finance/chart/AAPL?region=US&lang=en-US&includePrePost=false&interval=2m&range=1d&corsDomain=finance.yahoo.com&.tsrc=finance

I would suggest exploring the underlying AJAX calls which appear to present a nicely formatted JSON result and looking at the URL a number of params you can modify.

Upvotes: 0

QHarr
QHarr

Reputation: 84465

Given that data-reactid can change I would use a unique class to select. Selecting by class is also faster.

import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://finance.yahoo.com/quote/AAPL/')
soup = bs(r.content, 'lxml')
print(soup.select_one('.Mb\(-4px\)').text)

Upvotes: 0

Viji
Viji

Reputation: 11

Somehow the data-reactid is changed to 14 when reading the url.

page_soup = soup(page_html, "html.parser")
container = page_soup.find('span', {"data-reactid":'14'})
if container:
    print(container.text)

Upvotes: 1

Related Questions