Reputation: 9
How I can correctly get out email and text between < a href.. > < / a > tag ?
My code:
import re
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
url = input("Enter url -")
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
# Retrieve all of the anchor tags
count = 0
tags = soup.find_all(href=re.compile("mailto"))
for tag in tags:
count += 1
print(tag)
print("Total amount of mails:", count)
My programm is receiving a full tag <a href="mailto:[email protected]">John Test</a>
and I want to get only email adress and name. How can I correctly strip it out ?
Upvotes: 0
Views: 104
Reputation: 337
You can try in this way
from bs4 import BeautifulSoup
html = """<a href="mailto:[email protected]">John Test</a>"""
soup = BeautifulSoup(html, parser="html.parser", features="lxml")
for element in soup.find_all("a"):
if "mailto" in element["href"]:
email = element["href"].split(":")[1]
name = element.text
print(email, name)
Output
[email protected] John Test
Upvotes: 2