vindl
vindl

Reputation: 83

Parsing using Beautifulsoup

Is it possible to extract content which comes after text Final Text: (not a tag) using Beautifulsoup.

i.e. expecting only

<td>0&nbsp;/&nbsp;22&nbsp;FAIL</td></tr><tr>

Problem here is many tags doesn't have class or id etc. If i exctract only <td>, i will get all which is not required.

<td><strong>Final Text:</strong></td>
<td>0&nbsp;/&nbsp;22&nbsp;FAIL</td></tr><tr>
<td><strong>Ext:</strong></td>
<td>343&nbsp;/&nbsp;378&nbsp;FAIL</td></tr></table>

Upvotes: 0

Views: 80

Answers (3)

Keyur Potdar
Keyur Potdar

Reputation: 7248

You can find the <strong>Final Text:</strong> tag using find('strong', text='Final Text:'). Then, you can use the find_next() method to get the next <td> tag.

html = '''
<table>
    <tr>
        <td><strong>Final Text:</strong></td>
        <td>0&nbsp;/&nbsp;22&nbsp;FAIL</td>
    </tr>
    <tr>
        <td><strong>Ext:</strong></td>
        <td>343&nbsp;/&nbsp;378&nbsp;FAIL</td>
    </tr>
</table>
'''

soup = BeautifulSoup(html, 'lxml')

txt = soup.find('strong', text='Final Text:').find_next('td')
print(txt)

Output:

<td>0 / 22 FAIL</td>

Upvotes: 1

chasezimmy
chasezimmy

Reputation: 46

If the content which you're trying to get always comes after the first index of the <td></td> tag. Why not get the second index of the list of elements?

soup = BeautifulSoup(html)

td_list = soup.find('td')
td_list[1]  # This would be the FAIL element

Upvotes: 1

Umair Ayub
Umair Ayub

Reputation: 21341

Yes, its possible, consider this HTML

<table>
    <tr>
        <td><strong>Final Text:</strong></td>
        <td>0&nbsp;/&nbsp;22&nbsp;FAIL</td>
    </tr>
    <tr>
        <td><strong>Ext:</strong></td>
        <td>343&nbsp;/&nbsp;378&nbsp;FAIL</td>
    </tr>
</table>

This xpath will work

//*[contains(text(),'Final Text')]/parent::td/parent::tr/following-sibling::tr

Find tag containing text Final Text, get its parent td, then get its parent tr then get its following sibling tr

Upvotes: 1

Related Questions