Vikas Mishra
Vikas Mishra

Reputation: 65

Find a html tag using BeautifulSoup in Python

I want to find a specific tag in a html code like if there are 2 tags then how can i get the contents of the second tag and not the first one which soup.find(id='contact1') does here is the example html code

<table align="center"><th id="contact">STUDENT ID</th><th id="contact">NAME</th><th id="contact">   Phone </th><th id="contact"> NO.</th>
<p align="center" style="display:compact; font-size:18px; font-family:Arial, Helvetica, sans-serif; color:#CC3300">
</p><tr>
<td id="contact1">
2011XXA4438F </td> <td id="contact1"> SAM SRINIVAS KRISHNAGOPAL</td> <td id="contact1"> 9894398690 </td> <td id="contact1"> </td>
</tr>
</table>

What i want to do is to extract '2011XXA4438F' as a string how can i do this?

Upvotes: 2

Views: 5043

Answers (3)

Babatunde Mustapha
Babatunde Mustapha

Reputation: 2663

You can also do it this way:
target = soup.find("table", {"id": "contact1"})

Upvotes: 0

Splurk
Splurk

Reputation: 873

I'm pretty sure .find only gives you the first element that matches your query. Try using .findAll instead.

Check documentation here - http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html

EDIT: Misread your post. Just to understand completely. Do you want to always find the 2nd occurance of "id='contact1'"?

There is probably something more elegant, but you could do something like

v = soup.find_all(id='contact1')
length = 0
for x in v:
    length += 1
    if length = 2: #set number according to which occurrence you want. 
        #here is the second occurrence of id='contact1'. 

The above is completely non tested and just written directly here. And i've only just started using python, some there is probably a more efficient way of doing it :-)

Upvotes: 1

TerryA
TerryA

Reputation: 60024

<td id="contact1"> is the first tag with an id of "contact1". To obtain it, then soup.find is all you need:

>>> print soup.find(id='contact1').text.strip()
2011XXA4438F

If you're looking for other tags, then you'll want to use find_all:

>>> print soup.find_all(id='contact1')
[<td id="contact1">
2011XXA4438F </td>, <td id="contact1"> SAM SRINIVAS KRISHNAGOPAL</td>, <td id="contact1"> 9894398690 </td>, <td id="contact1"> </td>]

Upvotes: 4

Related Questions