SIM
SIM

Reputation: 22440

Unable to fetch an email link out of some script tag from a webpage

I've written a script in python to scrape an email address from a webpage but I am not being able to. The email address is sit within a script tag and I can't smash that barrier to fetch the content. Any help to get that will be much appreciated.

Webpage link

I've tried so far with:

import requests
from bs4 import BeautifulSoup

url = "replace_with_link_above"

res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml")
for items in soup.select(".profile-right-info"):
    email = items.select_one("dd a[href^='mailto:']")['href']
    print(email)

Upon execution I get the below error:

    email = items.select_one("dd a[href^='mailto:']")['href']
TypeError: 'NoneType' object is not subscriptable

Btw, the email link is at the second row under the title profile details in that webpage.

Upvotes: 2

Views: 189

Answers (1)

d2718nis
d2718nis

Reputation: 1269

You should check out the Network tab of the Chrome dev tools:

enter image description here

There is a block of code:

 <script language='JavaScript' type='text/javascript'>
 <!--
 var prefix = 'm&#97;&#105;lt&#111;:';
 var suffix = '';
 var attribs = '';
 var path = 'hr' + 'ef' + '=';
 var addy99716 = "R&#111;bz" + '&#64;';
 addy99716 = addy99716 + '&#97;ll&#105;nth&#101;p&#111;l&#105;sh' + '&#46;' + 'c&#111;m';
 document.write( '<a ' + path + '"' + prefix + addy99716 + suffix + '"' + attribs + '>' );
 document.write( addy99716 );
 document.write( '<\/a>' );
 //-->
 </script>

which evaluates to <a> tag with href attribute equal to:

m&#97;&#105;lt&#111;:R&#111;bz&#64;&#97;ll&#105;nth&#101;p&#111;l&#105;sh&#46;c&#111;m

which will be mailto:[email protected] if you decode the html entities, you could check it here: https://mothereff.in/html-entities

So, one option would be using something like Selenium as cgte proposed.

The other option is to get the contents of the <dd> tag, parse the js code and then either run it with node executable (which could be dangerous if you will not run it in a sandbox) or evaluate manually. The option with Selenium seems a lot more simple.

Upvotes: 2

Related Questions