Reputation: 83
Is it possible to extract content which comes after text Final Text: (not a tag) using Beautifulsoup.
i.e. expecting only
<td>0 / 22 FAIL</td></tr><tr>
Problem here is many tags doesn't have class or id etc. If i exctract only <td>,
i will get all which is not required.
<td><strong>Final Text:</strong></td>
<td>0 / 22 FAIL</td></tr><tr>
<td><strong>Ext:</strong></td>
<td>343 / 378 FAIL</td></tr></table>
Upvotes: 0
Views: 80
Reputation: 7248
You can find the <strong>Final Text:</strong>
tag using find('strong', text='Final Text:')
. Then, you can use the find_next()
method to get the next <td>
tag.
html = '''
<table>
<tr>
<td><strong>Final Text:</strong></td>
<td>0 / 22 FAIL</td>
</tr>
<tr>
<td><strong>Ext:</strong></td>
<td>343 / 378 FAIL</td>
</tr>
</table>
'''
soup = BeautifulSoup(html, 'lxml')
txt = soup.find('strong', text='Final Text:').find_next('td')
print(txt)
Output:
<td>0 / 22 FAIL</td>
Upvotes: 1
Reputation: 46
If the content which you're trying to get always comes after the first index of the <td></td>
tag. Why not get the second index of the list of elements?
soup = BeautifulSoup(html)
td_list = soup.find('td')
td_list[1] # This would be the FAIL element
Upvotes: 1
Reputation: 21341
Yes, its possible, consider this HTML
<table>
<tr>
<td><strong>Final Text:</strong></td>
<td>0 / 22 FAIL</td>
</tr>
<tr>
<td><strong>Ext:</strong></td>
<td>343 / 378 FAIL</td>
</tr>
</table>
This xpath will work
//*[contains(text(),'Final Text')]/parent::td/parent::tr/following-sibling::tr
Find tag containing text Final Text
, get its parent td
, then get its parent tr
then get its following sibling tr
Upvotes: 1