nikolanidic
nikolanidic

Reputation: 11

Someone maybe know this Web Scraping in python

Hello guys I need some help. I want to scrape e-mail from this web site https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing

And I have problem with this inspected elements because email don't show in may code when i start the program:

<div class="col-xs-12">
  <div class="separator-rule heading"></div>
  <h4>Provider Main Contact</h4>
  "
                              Klaus Stephan"
  <br>
  "
                              +49++49 16091922165"
  <br>
  "
                              [email protected]
                          "
</div>

Can someone know how to catch e-mail from this. Thanks for help guys.

Upvotes: 1

Views: 47

Answers (2)

Marc
Marc

Reputation: 734

You can also use regex pattern matching for analyzing the text for an email address.

A very powerfule expression can be found in this discussion: How to validate an email address using a regular expression?

url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
pattern='(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'
match=re.search(pattern,soup.get_text())
print(m[0])

> '[email protected]'

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195438

To get emails from block "Provider Main Contact" you can use this example:

import requests 
from bs4 import BeautifulSoup


url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

main_contact_block = soup.select_one('div:has(>h4:contains("Provider Main Contact"))')

emails = [text.strip() for text in main_contact_block.find_all(text=True) if '@' in text]
print(emails)

Prints:

['[email protected]']

Upvotes: 1

Related Questions