Reputation: 65
Could BeautifulSoup select no tag table? There's many tables in a HTML, but the data I want is in the table without any tags.
Here is my example: There are 2 tables in HTML. One is english, and the other is number.
from bs4 import BeautifulSoup
HTML2 = """
<table>
<tr>
<td class>a</td>
<td class>b</td>
<td class>c</td>
<td class>d</td>
</tr>
<tr>
<td class>e</td>
<td class>f</td>
<td class>g</td>
<td class>h</td>
</tr>
</table>
<table cellpadding="0">
<tr>
<td class>111</td>
<td class>222</td>
<td class>333</td>
<td class>444</td>
</tr>
<tr>
<td class>555</td>
<td class>666</td>
<td class>777</td>
<td class>888</td>
</tr>
"""
soup2 = BeautifulSoup(HTML2, 'html.parser')
f2 = soup2.select('table[cellpadding!="0"]') #<---I think the key point is here.
for div in f2:
row = ''
rows = div.findAll('tr')
for row in rows:
if(row.text.find('td') != False):
print(row.text)
I only want the data in the "english" table And make the format like following:
a b c d
e f g h
Then save to excel.
But I can only access that "number" table. Is there a hint? Thanks!
Upvotes: 0
Views: 2166
Reputation: 15376
You could use find_all
and select only tables that don't have a specific attribute.
f2 = soup2.find_all('table', {'cellpadding':None})
Or if you want to select tables that have absolutely no attributes:
f2 = [tbl for tbl in soup2.find_all('table') if not tbl.attrs]
f2
and pass it to the dataframe .
data = [
[td.text for td in tr.find_all('td')]
for table in f2 for tr in table.find_all('tr')
]
Upvotes: 2
Reputation: 301
You can use has_attr method to test whether table contains the cellpadding attribute:
soup2 = BeautifulSoup(HTML2, 'html.parser')
f2 = soup2.find_all('table')
for div in f2:
if not div.has_attr('cellpadding'):
row = ''
rows = div.findAll('tr')
for row in rows:
if(row.text.find('td') != False):
print(row.text)
Upvotes: 1