guidetuanhp
guidetuanhp

Reputation: 43

How to get content of html with bs4

I want to get content the same as: "Security code: 0905793"

<table dir="ltr">
<tr><td id="i1" style="padding:0; font-family:'Segoe UI Semibold', 'Segoe UI Bold', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:17px; color:#707070;">Microsoft account</td></tr>
<tr><td id="i2" style="padding:0; font-family:'Segoe UI Light', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:41px; color:#2672ec;">Security code</td></tr>
<tr><td id="i3" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
Please use the following security code for the Microsoft account <a dir="ltr" id="iAccount" class="link" style="color:#2672ec; text-decoration:none" href="mailto:am*****@hotmail.com" target="_blank">am*****@hotmail.com</a>.
</td></tr>
<tr><td id="i4" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
Security code: <span style="font-family:'Segoe UI Bold', 'Segoe UI Semibold', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:14px; font-weight:bold; color:#2a2a2a;">0905793</span>
</td></tr>
<tr><td id="i5" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
If you don't recognize the Microsoft account <a dir="ltr" id="iAccount" class="link" style="color:#2672ec; text-decoration:none" href="mailto:am*****@hotmail.com" target="_blank">am*****@hotmail.com</a>, you can <a id="iLink2" class="link" style="color:#2672ec; text-decoration:none" href="https://account.live.com/dp?ft=DS*vlOns7g2o2VsFA6DyYh9rIME5JQvIu5BBuxlhWl3d3PthvzHcoV9C9WuyZPIdOmKP7IBBTC7GWtI*TuFa0kLmt2COs!WXd2uaCyjW9JNzLYRZ4WUeGg0gjOD9qp2Fu5n34sS41OUI0bpzq7dPIpQKPFz4l4bVM3Mg0R1pUWmJmIPg95OIfPQATptlOiZdoyBHvQnOW4d0tDJb3jDZk4*ub0vmFr2GDDMVrFsU5qb0wvIi2kR1hIRZqA6Z4JqBmoGAjUpfe1xaTAYJ3IorDUzDiXHI*aZ8iDK1krwGyNU45NDjvobZlUfb84Z1fJrdXw%24%24" target="_blank">click here</a> to remove your email address from that account.
</td></tr>
<tr><td id="i6" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">Thanks,</td></tr>
<tr><td id="i7" style="padding:0; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">The Microsoft account team</td></tr>

Can you show me method? I use bs4 to find id = i4 but it only show that:

[<td id="i4" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
Security code: <span style="font-family:'Segoe UI Bold', 'Segoe UI Semibold', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:14px; font-weight:bold; color:#2a2a2a;">0905793</span>
</td>]

Upvotes: 1

Views: 237

Answers (3)

MendelG
MendelG

Reputation: 20018

You can specify id= as an argument to .find(), and call .text

print(soup.find("td", id="i4").text.strip())

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195428

You can use .get_text() method:

from bs4 import BeautifulSoup

html_doc = """<table dir="ltr">
<tr><td id="i1" style="padding:0; font-family:'Segoe UI Semibold', 'Segoe UI Bold', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:17px; color:#707070;">Microsoft account</td></tr>
<tr><td id="i2" style="padding:0; font-family:'Segoe UI Light', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:41px; color:#2672ec;">Security code</td></tr>
<tr><td id="i3" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
Please use the following security code for the Microsoft account <a dir="ltr" id="iAccount" class="link" style="color:#2672ec; text-decoration:none" href="mailto:am*****@hotmail.com" target="_blank">am*****@hotmail.com</a>.
</td></tr>
<tr><td id="i4" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
Security code: <span style="font-family:'Segoe UI Bold', 'Segoe UI Semibold', 'Segoe UI', 'Helvetica Neue Medium', Arial, sans-serif; font-size:14px; font-weight:bold; color:#2a2a2a;">0905793</span>
</td></tr>
<tr><td id="i5" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">
If you don't recognize the Microsoft account <a dir="ltr" id="iAccount" class="link" style="color:#2672ec; text-decoration:none" href="mailto:am*****@hotmail.com" target="_blank">am*****@hotmail.com</a>, you can <a id="iLink2" class="link" style="color:#2672ec; text-decoration:none" href="https://account.live.com/dp?ft=DS*vlOns7g2o2VsFA6DyYh9rIME5JQvIu5BBuxlhWl3d3PthvzHcoV9C9WuyZPIdOmKP7IBBTC7GWtI*TuFa0kLmt2COs!WXd2uaCyjW9JNzLYRZ4WUeGg0gjOD9qp2Fu5n34sS41OUI0bpzq7dPIpQKPFz4l4bVM3Mg0R1pUWmJmIPg95OIfPQATptlOiZdoyBHvQnOW4d0tDJb3jDZk4*ub0vmFr2GDDMVrFsU5qb0wvIi2kR1hIRZqA6Z4JqBmoGAjUpfe1xaTAYJ3IorDUzDiXHI*aZ8iDK1krwGyNU45NDjvobZlUfb84Z1fJrdXw%24%24" target="_blank">click here</a> to remove your email address from that account.
</td></tr>
<tr><td id="i6" style="padding:0; padding-top:25px; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">Thanks,</td></tr>
<tr><td id="i7" style="padding:0; font-family:'Segoe UI', Tahoma, Verdana, Arial, sans-serif; font-size:14px; color:#2a2a2a;">The Microsoft account team</td></tr>"""

soup = BeautifulSoup(html_doc, "html.parser")

code = soup.select_one("td#i4").get_text(strip=True, separator=" ")
print(code)

Prints:

Security code: 0905793

Upvotes: 1

imxitiz
imxitiz

Reputation: 3987

If you are getting that then, try this to get text from that :

print(td[0].text)

td is the variable where you get your td list.

Upvotes: 1

Related Questions