sdgd
sdgd

Reputation: 733

Parsing <TR> </TR> tags and printing the elements using BeautifulSoup in Python

I'm new to Python and I'm currently working on solving problems to improve my coding skills. I had submitted a form using python and from the next page that's displayed after submitting the form, I want to collect some data and display it in my output. The required data I want to collect is between <TR> </TR> tags and there are lot of <TR> </TR> like that in that page.

for your reference:

<TR class="even"><TD class="id">6422275</TD><TD class="date"><NOBR>09:06:49</NOBR><BR><NOBR>27 Feb 2016</NOBR></TD><TD class="coder"><A HREF="author.aspx?id=201837">THE_ROCK</A></TD><TD class="problem"><A HREF="problem.aspx?space=1&amp;num=1000">1000<SPAN CLASS="problemname">. A+B Problem</SPAN></A></TD><TD class="language">Python 2.7</TD><TD class="verdict_ac">Accepted</TD><TD class="test"><BR></TD><TD class="runtime">0.015</TD><TD class="memory">160 KB</TD></TR>

So, from the whole HTML page, I want to read the name, THE_ROCK (it's present in the above give tag) and if that exists in that page, I want to print the complete elements (like problem, problemname, verdict_ac, runtime and memory) in that particular <TR> </TR> tags. I understood that I can use BeautifulSoup but I don't know how to compare stuff and print the elements / tags that are needed specifically.

Code:

res = br.submit()
    final_url = res.geturl()
    html_doc = br.open(final_url)
    html_read = (html_doc.read())
    soup = BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
    for row in soup.find_all('TR'):
        print '\n'.join(row.stripped_strings)

I'm trying to find the TR tag but it was not helpful and no output is being printed. Can someone tell me where am I doing wrong. Is my approach wrong or flow is wrong? Could you please tell me why I'm not able to get the output as expected. Thanks in advance. Any help would be much appreciated.

Upvotes: 1

Views: 2865

Answers (2)

alecxe
alecxe

Reputation: 473873

Find the element by text and locate the tr parent using find_parent():

tr = soup.find(text="THE_ROCK").find_parent("tr")

Upvotes: 2

antikantian
antikantian

Reputation: 620

Maybe this will help:

soup = BeautifulSoup(devs_html, 'html.parser')
row = soup.find_all('tr', class_='even')

In [195]: row
Out[195]: [<tr class="even"><td class="id">6422275</td><td class="date"><nobr>09:06:49</nobr><br><nobr>27 Feb 2016</nobr></br></td><td class="coder"><a href="author.aspx?id=201837">THE_ROCK</a></td><td class="problem"><a href="problem.aspx?space=1&amp;num=1000">1000<span class="problemname">. A+B Problem</span></a></td><td class="language">Python 2.7</td><td class="verdict_ac">Accepted</td><td class="test"><br/></td><td class="runtime">0.015</td><td class="memory">160 KB</td></tr>]

In [196]: row[0].contents
Out[196]: 
[<td class="id">6422275</td>,
<td class="date"><nobr>09:06:49</nobr><br><nobr>27 Feb 2016</nobr></br></td>,
<td class="coder"><a href="author.aspx?id=201837">THE_ROCK</a></td>,
<td class="problem"><a href="problem.aspx?space=1&amp;num=1000">1000<span class="problemname">. A+B Problem</span></a></td>,
<td class="language">Python 2.7</td>,
<td class="verdict_ac">Accepted</td>,
<td class="test"><br/></td>,
<td class="runtime">0.015</td>,
<td class="memory">160 KB</td>]

Ok, so basically we just searched for rows via the row class (: table row). That should give you a list of rows that you can iterate over.

Just taking a single row, row[0], as an example, you can see that you have all the data () contained in the row. To get the info out of them, you can do:

In [197]: row[0].find(class_='id').text
Out[197]: u'6422275'

In [198]: row[0].find(class_='coder').text
Out[198]: u'THE_ROCK'

And so on, until you have all the info you want.

EDIT: Ok, if you just want to find THE_ROCK and print the row:

for r in row:
    if 'THE_ROCK' in r.find(class_='coder').text:
        print(r)

That should give you the entire row, and you can do what whatever you'd like with it.

Upvotes: 1

Related Questions