Reputation: 862
my HTML has several tables, the first table is:
<table>
<tr>
<td>
<div id="string">
</div>
</td>
</tr>
</table>
and the rest are of the form:
<table class="confluenceTable" data-csvtable="1">
<tbody>
<tr>
<th class="highlight-grey confluenceTh" data-highlight-colour="grey" rowspan="2" style="text-align: center;">Negev</th>
I want to scrape data from the tables. when I use:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'XXX'
soup = BeautifulSoup(urlopen(url).read(), "lxml")
for table in soup.findAll('table'):
print(table)
it only finds the first table. when I change the search to:
soup.findAll("table", { "class" : "confluenceTable" })
it doesn't find anything. What am I missing?
using python 3.4 on windows with BeautifulSoup 4.5
Upvotes: 1
Views: 3638
Reputation: 474221
I suspect you are trying to scrape an Atlassian Confluence page which is usually quite dynamic and makes use of JavaScript intensively to load the page. If you look into the HTML source you download with urllib
you would not find table
elements with confluenceTable
class.
Instead, you should either look into using Confluence API, or use a browser automation tool like selenium
.
Upvotes: 2