Reputation: 99
For a project using beautifulsoup I need to acquire the "Tesla Quarterly Revenue" table from this site, https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue. I think ive acquired the initial html data accurately but I'm unsure what tag the phrase "Tesla Quarterly Revenue" is attached to, I thought it might be under thead but that does not output the table.
r=requests.get( 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
html_data=r.text
soup=BeautifulSoup(html_data)
#print(soup.prettify())
table_=soup.find_all('thead','Tesla Quarterly Revenue')
table_row=table_.find_all('tr')
for row in table_row:
col = row.find_all("td")
date =col[0].text
revenue =col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue.head()
Soup output here
<div class="col-xs-6">
<table class="historical_data_table table">
<thead>
<tr>
<th colspan="2" style="text-align:center">
Tesla Quarterly Revenue
<br/>
<span style="font-size:14px;">
(Millions of US $)
</span>
</th>
</tr>
</thead>
I know i could select the whole area using
soup.find_all('div',class_='col-xs-6')
But there are multiple tables under this tag and im unsure how to refine it further. Thanks for any help.
Upvotes: 0
Views: 112
Reputation: 382
You should first select main_content div
then select table inside it and finally find the right table.
this code will help you to find the right tbody:
r=requests.get( 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
html_data=r.text
soup=BeautifulSoup(html_data)
main = soup.find('div',id = 'main_content')
tables = main.find_all('table', class_='historical_data_table table')
table_ = ''
for table in tables:
if table.text.find('Tesla Quarterly Revenue') >= 0:
table_ = table
break
table_ = table_.find('tbody')
table_row=table_.find_all('tr')
Upvotes: 1
Reputation: 84465
It is in a table header. You can grab with the following css selector
soup.select_one('#style-1 div + div .historical_data_table th')
If you want literally the first line only, you can use stripped_strings and index 0:
[s for s in soup.select_one('#style-1 div + div .historical_data_table th').stripped_strings][0]
As there are multiple tables with the class historical_data_table
, the above selector uses element with id style-1
as an anchor, the moves to the table with class historical_data_table
that is a child of a div
which is an immediate sibling of another div
, which is a child of that anchor; it then moves to the child th
of that table.
Upvotes: 1