Unable to fetch an email link out of some script tag from a webpage

Question

I've written a script in python to scrape an email address from a webpage but I am not being able to. The email address is sit within a script tag and I can't smash that barrier to fetch the content. Any help to get that will be much appreciated.

Webpage link

I've tried so far with:

import requests
from bs4 import BeautifulSoup

url = "replace_with_link_above"

res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml")
for items in soup.select(".profile-right-info"):
    email = items.select_one("dd a[href^='mailto:']")['href']
    print(email)

Upon execution I get the below error:

    email = items.select_one("dd a[href^='mailto:']")['href']
TypeError: 'NoneType' object is not subscriptable

Btw, the email link is at the second row under the title profile details in that webpage.

d2718nis · Accepted Answer

You should check out the Network tab of the Chrome dev tools:

There is a block of code:

which evaluates to tag with href attribute equal to:

mailto:Robz@allinthepolish.com

which will be mailto:Robz@allinthepolish.com if you decode the html entities, you could check it here: https://mothereff.in/html-entities

So, one option would be using something like Selenium as cgte proposed.

The other option is to get the contents of the

tag, parse the js code and then either run it with node executable (which could be dangerous if you will not run it in a sandbox) or evaluate manually. The option with Selenium seems a lot more simple.

Unable to fetch an email link out of some script tag from a webpage

Answers (1)

Related Questions