Reputation: 465
I have a question about selecting a list of tags (or single tags) using a condition on one of the attributes of it's children. Specifically, given the HTML code:
<tbody>
<tr class="" data-row="0">
<tr class="" data-row="1">
<tr class="" data-row="2">
<td align="right" csk="13">13</td>
<td align="left" csk="Jones,Andre"><a href="/players/andre-jones-2.html">Andre Jones</a>
</td>
<tr class="" data-row="3">
<td align="right" csk="7">7</td>
<td align="left" csk="Jones,DeAndre"><a href="/players/deandre-jones-1.html">DeAndre Jones</a>
</td>
<tr class="" data-row="4">
<tr class="" data-row="5">
I have a unicode variable coming from an external loop and I am trying to look through each row in the table to extract the <tr>
tags with Player==Table.tr.a.text
and to identify duplicate player names in Table
. So, for instance, if there is more than one player with Player=Andre Jones
the MyRow
object returns all <tr>
tags that contain that players name, while if there is only one row with Player=Andre Jones
, then MyRow
just contains the single element <tr>
with anchor text attribute equal to Andre Jones
. I've been trying things like
Table = soup.find('tbody')
MyRow = Table.find_all(lambda X: X.name=='tr' and Player == X.text)
But this returns []
for MyRow
. If I use
MyRow = Table.find_all(lambda X: X.name=='tr' and Player in X.text)
This will pick any <tr>
that has Player
as a substring of X.text
. In the example code above, it extracts both <tr>
tags withe Table.tr.td.a.text=='Andre Jones'
and Table.tr.td.a.text=='DeAndre Jones'
. Any help would be appreciated.
Upvotes: 2
Views: 2565
Reputation: 19628
Whatever you desire. :)
Solution1
Logic: find the first tag whose tag name is tr and contains 'FooName' in this tag's text including its children.
# Exact Match (text is unicode, turn into str)
print Table.find(lambda tag: tag.name=='tr' and 'FooName' == tag.text.encode('utf-8'))
# Fuzzy Match
# print Table.find(lambda tag: tag.name=='tr' and 'FooName' in tag.text)
Output:
<tr class="" data-row="2">
<td align="right" csk="3">3</td>
<td align="left" csk="Wentz,Parker">
<a href="/players/Foo-Name-1.html">FooName</a>
</td>
</tr>
Solution2
Logic: find the element whose text contains FooName
, the anchor
tag in this case. Then go up the tree and search for the all its parents(including ancestors) whose tag name is tr
# Exact Match
print Table.find(text='FooName').find_parent('tr')
# Fuzzy Match
# import re
# print Table.find(text=re.compile('FooName')).find_parent('tr')
Output
<tr class="" data-row="2">
<td align="right" csk="3">3</td>
<td align="left" csk="Wentz,Parker">
<a href="/players/Foo-Name-1.html">FooName</a>
</td>
</tr>
Upvotes: 2
Reputation: 298206
You could do this easily with XPath and lxml:
import lxml.html
root = lxml.html.fromstring('''...''')
td = root.xpath('//tr[.//a[text() = "FooName"]]')
The BeautifulSoup "equivalent" would be something like:
rows = soup.find('tbody').find_all('tr')
td = next(row for row in rows if row.find('a', text='FooName'))
Or if you think about it backwards:
td = soup.find('a', text='FooName').find_parent('tr')
Upvotes: 3