Reputation: 11
Hello guys I need some help. I want to scrape e-mail from this web site https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing
And I have problem with this inspected elements because email don't show in may code when i start the program:
<div class="col-xs-12">
<div class="separator-rule heading"></div>
<h4>Provider Main Contact</h4>
"
Klaus Stephan"
<br>
"
+49++49 16091922165"
<br>
"
[email protected]
"
</div>
Can someone know how to catch e-mail from this. Thanks for help guys.
Upvotes: 1
Views: 47
Reputation: 734
You can also use regex pattern matching for analyzing the text for an email address.
A very powerfule expression can be found in this discussion: How to validate an email address using a regular expression?
url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
pattern='(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'
match=re.search(pattern,soup.get_text())
print(m[0])
> '[email protected]'
Upvotes: 0
Reputation: 195438
To get emails from block "Provider Main Contact"
you can use this example:
import requests
from bs4 import BeautifulSoup
url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
main_contact_block = soup.select_one('div:has(>h4:contains("Provider Main Contact"))')
emails = [text.strip() for text in main_contact_block.find_all(text=True) if '@' in text]
print(emails)
Prints:
['[email protected]']
Upvotes: 1