How to search for specific word using BS4, then get text in same element immediately after that word?

Question

I'm very new to BeautifulSoup and to Python. I am crawling some pages where sometimes a phone number is given and sometimes it is not. If it's there, I want to scrape it. The HTML is very simple:


    Email: someone@somewhere.com
    Telephone: 1234567890
    Postal code: B3H 2F5

I am checking to see if the phone number is there like this:

phoneNumber = soup.find(string='Telephone:')
if phoneNumber:
    phoneNumber = # Some code here to get the actual number 
else:
    phoneNumber = ('None')
print (phoneNumber)

There are usually several other p tags in that div, but the same ones aren't always there, so I can't rely on them as reference points. The phone number doesn't always follow the same pattern, either. The best I can do is identify that a phone number is always preceded by 'Telephone:' and is wrapped in a p tag. This seems to be the only surefire way to locate it.

What I don't understand is how to get the actual phone number, that is, anything in the

tag after 'Telephone:'

How do I get the numbers in this element after the word 'Telephone:'?

Sebastien D · Accepted Answer

With some Regex logic you can find directly the

tag containing the phone number :

import re
from bs4 import BeautifulSoup

html = """
    Email: someone@somewhere.com
    
    Postal code: B3H 2F5
    Telephone: 1234567890
"""

soup = BeautifulSoup(html)

#Find the tag containing "Telephone:"
phone_tag = soup.find('p', text=re.compile('Telephone:'))

if phone_tag:
    phone = phone_tag.text.replace('Telephone:','').strip()
else:
    phone = None

How to search for specific word using BS4, then get text in same element immediately after that word?

Answers (2)

Related Questions