How to check if email exists in p tag using Beautiful Soup?

Question

I'm using Beautiful Soup to try and check if there is an email address in a paragraph tag within a div tag. I'm for looping through a list of the divs:

for div in list_of_divs:

Where each div:


  Hello
  hereIsAnEmail@gmail.com

Within the for loop, I have:

email = div.find(name="p", string=re.compile("^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$"))

The name="p" is working fine, but I'm not sure what to put for the string. Any help or direction is appreciated.

Wiktor Stribiżew · Accepted Answer

You may use

html="""
  Hello
  hereIsAnEmail@gmail.com
"""
soup = BeautifulSoup(html, "html5lib")
list_of_divs = soup.find_all('div')
for div in list_of_divs:
    emails = div.find_all("p", string=re.compile(r"^[\w.-]+@(?:[\w-]+\.)+\w{2,4}$"))
    print([em.text for em in emails])

Output: ['hereIsAnEmail@gmail.com']

Note that ^[\w.-]+@(?:[\w-]+\.)+\w{2,4}$ is quite restrictive, you might possible want to use a more generic one like ^\S+@\S+\.\S+$ that matches 1+ non-whitespace chars, @, 1+ non-whitespace chars, . and again 1+ non-whitespace chars.

Notes on the code:

With div.find_all("p", string=re.compile(r"^[\w.-]+@(?:[\w-]+\.)+\w{2,4}$")), you get all child p tags of the current div element whose text matches the regex pattern fully
print([em.text for em in emails]) prints just texts of all the found p nodes with only emails in them.

How to check if email exists in p tag using Beautiful Soup?

Answers (1)

Related Questions