Candice
Candice

Reputation: 199

how do I parse info from yahoo finance with beautiful soup

I have got so far by using soup.findAll('span')

<span data-reactid="12">Previous Close</span>,
     <span class="Trsdu(0.3s) " data-reactid="14">5.52</span>,
     <span data-reactid="17"></span>,
     <span class="Trsdu(0.3s) " data-reactid="19">5.49</span>,
     <span data-reactid="38">Volume</span>,
     <span class="Trsdu(0.3s) " data-reactid="40">1,164,604</span>,
     ...

I want a tabkle that shows me

Open 5.49
Volume 1,164,604

... I tried soup.findAll('span').text but it gives error msg:

ResultSet object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

this is the source:

https://finance.yahoo.com/quote/gxl.ax?p=gxl.ax

Upvotes: 1

Views: 191

Answers (2)

chitown88
chitown88

Reputation: 28565

soup.findAll('span') will return object/elements in ResultSet. You'd have to iterate through those to print the text. So try:

spans = soup.findAll('span')
for ele in spans:
    data = ele.text
    print(data)

To take your output and put into a dataframe:

your_output = ['Previous Close', '5.52', 'Open', '5.49', 'Bid', 'Ask', "Day's Range", '52 Week Range', 'Volume', '1,164,604', 'Avg. Volume', '660,530']

headers = your_output[::2]
data = your_output[1::2]

df = pd.DataFrame([data], columns = headers)

Additional

You certainly can use BeautifulSoup to parse and throw into a dataframe by iterating through the elements. I would like to offer an aleternative to BeautifulSoup.

Pandas does most of the work for you if it can identify tables within the html, by using .read_html. You can achieve the dataframe type of table you are looking for using that.

import pandas as pd

tables = pd.read_html(url)
df = pd.concat( [ table for table in tables ] )

Output:

print (df)
                          0             1
0            Previous Close          5.50
1                      Open          5.50
2                       Bid      5.47 x 0
3                       Ask      5.51 x 0
4               Day's Range   5.47 - 5.51
5             52 Week Range   3.58 - 6.49
6                    Volume        634191
7               Avg. Volume        675718
0                Market Cap      660.137M
1         Beta (3Y Monthly)          0.10
2            PE Ratio (TTM)         31.49
3                 EPS (TTM)          0.17
4             Earnings Date           NaN
5  Forward Dividend & Yield  0.15 (2.82%)
6          Ex-Dividend Date    2019-02-12
7             1y Target Est          5.17

Upvotes: 2

Will
Will

Reputation: 7017

Luckily the error gives us a hint:

You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

Try one of these:

soup.findAll('span')[0].text
soup.findAll('span')[i].text
soup.find('span').text

This is a generic problem when navigating many selector systems, CSS selectors included. To operate on an element it must be a single element rather than a set. findAll() returns a set (array), so you can either index into that array (e.g. [i]) or find the first match with find().

Upvotes: 2

Related Questions