Reputation: 13
I'm trying to get string from a table row using beautifulsoup. String I want to get are 'SANDAL' and 'SHORTS', from second and third rows. I know this can be solved with regular expression or with string functions but I want to learn beautifulsoup and do as much as possible with beautifulsoup.
Clipped python code
soup=beautifulsoup(page,'html.parser')
table=soup.find('table')
row=table.find_next('tr')
row=row.find_next('tr')
HTML
<html>
<body>
<div id="body">
<div class="data">
<table id="products">
<tr><td>PRODUCT<td class="ole1">ID<td class="c1">TYPE<td class="ole1">WHEN<td class="ole4">ID<td class="ole4">ID</td></tr>
<tr><td>SANDAL<td class="ole1">77313<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878717</td></tr>
<tr><td>SHORTS<td class="ole1">77314<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878718</td></tr>
</table>
</div>
</div>
</body>
</html>
Upvotes: 1
Views: 37
Reputation: 195438
To get text from first column of the table (sans header), you can use this script:
from bs4 import BeautifulSoup
txt = '''
<html>
<body>
<div id="body">
<div class="data">
<table id="products">
<tr><td>PRODUCT<td class="ole1">ID<td class="c1">TYPE<td class="ole1">WHEN<td class="ole4">ID<td class="ole4">ID</td></tr>
<tr><td>SANDAL<td class="ole1">77313<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878717</td></tr>
<tr><td>SHORTS<td class="ole1">77314<td class="ole1">wear<td class="ole1">new<td class="ole4">id<td class="ole4">878718</td></tr>
</table>
</div>
</div>
</body>
</html>'''
soup = BeautifulSoup(txt, 'lxml') # <-- lxml is important here (to parse the HTML code correctly)
for tr in soup.find('table', id='products').find_all('tr')[1:]: # <-- [1:] because we want to skip the header
print(tr.td.text) # <-- print contents of first <td> tag
Prints:
SANDAL
SHORTS
Upvotes: 1