Reputation: 117
I'm trying to scrape data from a website which has information inside P tag. The only data i'm interested in is contact which is in the same P tag. How can i get only the required data?
Here is the ss of the website. How can i get the text from Company to tel no.?
Upvotes: 1
Views: 1105
Reputation: 195573
You can use re
module to parse the text.
For example:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://www.forpressrelease.com/forpressrelease/553538/4/china-leading-cabinet-handles-supplier-rochehandle-celebrates-success-of-entering-european-market'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
txt = soup.select_one('.single_page_content').get_text(strip=True, separator='\n')
company = re.findall(r'Company:\s*(.*)', txt)[0]
address = re.findall(r'Address:\s*(.*)', txt)[0]
contact = re.findall(r'Contact:\s*(.*)', txt)[0]
email = re.findall(r'Email:\s*(.*?)\s*(?=\w+:)', txt, flags=re.S)[0]
tel = re.findall(r'Tel:\s*(.*)', txt)[0]
mob = re.findall(r'Mob:\s*(.*)', txt)[0]
url = re.findall(r'Url\s*:\s*-\s*(.*)', txt, flags=re.S)[0]
print('{:<15}: {}'.format('Company', company))
print('{:<15}: {}'.format('Address', address))
print('{:<15}: {}'.format('Contact', contact))
print('{:<15}: {}'.format('Email', email))
print('{:<15}: {}'.format('Tel', tel))
print('{:<15}: {}'.format('Mob', mob))
print('{:<15}: {}'.format('Url', url))
Prints:
Company : Dongguan Roche Industrial Co., Ltd
Address : No.83, XiZheng 1st Road, Shajiao Community, Humen Town, Dongguan City, Guangdong Province, China 523936
Contact : Robin Luo
Email : [email protected]
Tel : 0769-89366747
Mob : +86-13392706499
Url : https://www.rochehandle.com
Upvotes: 2
Reputation: 1803
You need to use regular expressions to parse the <P>
block you get from BeautifulSoup:
import re
text_from_p = """
some text
some more
Tel: 0234-234345-45
some more text
"""
match = re.search(r"Tel: (?P<tel>[0-9\- ]*)", text_from_p)
if match:
print(match.group("tel"))
else:
print("Tel not found")
You get:
0234-234345-45
Upvotes: 1