GKelly
GKelly

Reputation: 99

Table attribute meanings in BeautifulSoup

For a project using beautifulsoup I need to acquire the "Tesla Quarterly Revenue" table from this site, https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue. I think ive acquired the initial html data accurately but I'm unsure what tag the phrase "Tesla Quarterly Revenue" is attached to, I thought it might be under thead but that does not output the table.

r=requests.get( 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
html_data=r.text
soup=BeautifulSoup(html_data)
#print(soup.prettify())
table_=soup.find_all('thead','Tesla Quarterly Revenue')
table_row=table_.find_all('tr')
for row in table_row:
    col = row.find_all("td")
    date =col[0].text
    revenue =col[1].text
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)

tesla_revenue.head()

Soup output here

 <div class="col-xs-6">
       <table class="historical_data_table table">
        <thead>
         <tr>
          <th colspan="2" style="text-align:center">
           Tesla Quarterly Revenue
           <br/>
           <span style="font-size:14px;">
            (Millions of US $)
           </span>
          </th>
         </tr>
        </thead> 

I know i could select the whole area using

soup.find_all('div',class_='col-xs-6') 

But there are multiple tables under this tag and im unsure how to refine it further. Thanks for any help.

Upvotes: 0

Views: 112

Answers (2)

ali noori
ali noori

Reputation: 382

You should first select main_content div then select table inside it and finally find the right table. this code will help you to find the right tbody:

r=requests.get( 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
html_data=r.text
soup=BeautifulSoup(html_data)
main = soup.find('div',id = 'main_content')
tables = main.find_all('table', class_='historical_data_table table')
table_ = ''
for table in tables:
    if table.text.find('Tesla Quarterly Revenue') >= 0:
        table_ = table
        break
table_ = table_.find('tbody')

table_row=table_.find_all('tr')

Upvotes: 1

QHarr
QHarr

Reputation: 84465

It is in a table header. You can grab with the following css selector

soup.select_one('#style-1 div + div .historical_data_table th')

If you want literally the first line only, you can use stripped_strings and index 0:

[s for s in soup.select_one('#style-1 div + div .historical_data_table th').stripped_strings][0]

As there are multiple tables with the class historical_data_table, the above selector uses element with id style-1 as an anchor, the moves to the table with class historical_data_table that is a child of a div which is an immediate sibling of another div, which is a child of that anchor; it then moves to the child th of that table.

Upvotes: 1

Related Questions