Laura
Laura

Reputation: 9

Strip email and text from a full tag

How I can correctly get out email and text between < a href.. > < / a > tag ?

My code:

import re
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup


url = input("Enter url -")
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")

# Retrieve all of the anchor tags
count = 0
tags = soup.find_all(href=re.compile("mailto"))
for tag in tags:
    count += 1
    print(tag)
print("Total amount of mails:", count)

My programm is receiving a full tag <a href="mailto:[email protected]">John Test</a> and I want to get only email adress and name. How can I correctly strip it out ?

Upvotes: 0

Views: 104

Answers (1)

Funpy97
Funpy97

Reputation: 337

You can try in this way


from bs4 import BeautifulSoup

html = """<a href="mailto:[email protected]">John Test</a>"""

soup = BeautifulSoup(html, parser="html.parser", features="lxml")

for element in soup.find_all("a"):

    if "mailto" in element["href"]:
        email = element["href"].split(":")[1]
        name = element.text

        print(email, name)

Output

[email protected] John Test

Upvotes: 2

Related Questions