Reputation: 11
I wanted to download the data from a page in which the link of each data are found in rows of a table.
I wrote a code using BeautifulSoup to read href of all rows, but it couldn't provide me the links list to download them. I guess it couldn't see table data (td) in each table row (tr).
from bs4 import BeautifulSoup
import urllib.request
testurl = 'https://www.ercot.com/mp/data-products/data-product-details?id=NP3-562-CD'
page = urllib.request.urlopen(testurl)
page_content = BeautifulSoup(page, "html.parser")
table_dt = page_content.find_all("table")
for tt in table_dt.select("tr"):
print(tt)
## print
<tr>
<th>Friendly Name</th>
<th colspan="2">Posted</th>
<th>Available Files</th>
</tr>##
The table shows:
[<table class="table table-condensed report-table" id="reportTable">
<thead>
<tr>
<th>Friendly Name</th>
<th colspan="2">Posted</th>
<th>Available Files</th>
</tr>
</thead>
<tbody>
</tbody>
</table>]
As it can be seen, there is no info for other rows (tr), and it only captures the header row information.
Could you please guide me to get data the link of data for each rows in order to download them?
Upvotes: 1
Views: 33
Reputation: 10389
Most likely, the structure of the table is in the original HTML page, and the row data is retrieved by a Javascript request. If you can figure out what the javacript request is (probably by using your browser's "web developer" tools), you can get it that way.
Upvotes: 1