Mark Clements
Mark Clements

Reputation: 465

Find a List of Tags Based on Text Value of Children in Beautiful Soup

I have a question about selecting a list of tags (or single tags) using a condition on one of the attributes of it's children. Specifically, given the HTML code:

<tbody>
<tr class="" data-row="0">
<tr class="" data-row="1">
<tr class="" data-row="2">
    <td align="right" csk="13">13</td>
    <td align="left" csk="Jones,Andre"><a href="/players/andre-jones-2.html">Andre Jones</a>       
    </td>
<tr class="" data-row="3">
    <td align="right" csk="7">7</td>
    <td align="left" csk="Jones,DeAndre"><a href="/players/deandre-jones-1.html">DeAndre Jones</a>
    </td>
 <tr class="" data-row="4">
 <tr class="" data-row="5">

I have a unicode variable coming from an external loop and I am trying to look through each row in the table to extract the <tr> tags with Player==Table.tr.a.text and to identify duplicate player names in Table. So, for instance, if there is more than one player with Player=Andre Jones the MyRow object returns all <tr> tags that contain that players name, while if there is only one row with Player=Andre Jones, then MyRow just contains the single element <tr> with anchor text attribute equal to Andre Jones. I've been trying things like

Table = soup.find('tbody')
MyRow = Table.find_all(lambda X: X.name=='tr' and Player == X.text)

But this returns [] for MyRow. If I use

MyRow = Table.find_all(lambda X: X.name=='tr' and Player in X.text)

This will pick any <tr> that has Player as a substring of X.text. In the example code above, it extracts both <tr> tags withe Table.tr.td.a.text=='Andre Jones' and Table.tr.td.a.text=='DeAndre Jones'. Any help would be appreciated.

Upvotes: 2

Views: 2565

Answers (2)

B.Mr.W.
B.Mr.W.

Reputation: 19628

Whatever you desire. :)

Solution1

Logic: find the first tag whose tag name is tr and contains 'FooName' in this tag's text including its children.

# Exact Match  (text is unicode, turn into str)
print Table.find(lambda tag: tag.name=='tr' and 'FooName' == tag.text.encode('utf-8'))
# Fuzzy Match
# print Table.find(lambda tag: tag.name=='tr' and 'FooName' in tag.text)

Output:

<tr class="" data-row="2">
<td align="right" csk="3">3</td>
<td align="left" csk="Wentz,Parker">
<a href="/players/Foo-Name-1.html">FooName</a>
</td>
</tr>

Solution2

Logic: find the element whose text contains FooName, the anchor tag in this case. Then go up the tree and search for the all its parents(including ancestors) whose tag name is tr

# Exact Match
print Table.find(text='FooName').find_parent('tr')
# Fuzzy Match
# import re
# print Table.find(text=re.compile('FooName')).find_parent('tr')

Output

<tr class="" data-row="2">
<td align="right" csk="3">3</td>
<td align="left" csk="Wentz,Parker">
<a href="/players/Foo-Name-1.html">FooName</a>
</td>
</tr>

Upvotes: 2

Blender
Blender

Reputation: 298206

You could do this easily with XPath and lxml:

import lxml.html

root = lxml.html.fromstring('''...''')
td = root.xpath('//tr[.//a[text() = "FooName"]]')

The BeautifulSoup "equivalent" would be something like:

rows = soup.find('tbody').find_all('tr')
td = next(row for row in rows if row.find('a', text='FooName'))

Or if you think about it backwards:

td = soup.find('a', text='FooName').find_parent('tr')

Upvotes: 3

Related Questions