MITHU
MITHU

Reputation: 164

Unable to grab some information displayed in a new tab

I've created a script in python using requests module to fetch some information displayed upon filling in a form using this email [email protected]. The problem is when I hit the search button, I can see a new tab containing all the information I wish to grab. Moreover, I don't see any link in the All tab under Network section within chrome dev tools. So, I'm hopeless as to how I can get the information using requests module.

website address

Steps to populate the result manually:

Put this email address [email protected] next to the inputbox of Email address and hit the Search button.

I've tried with:

import requests
from bs4 import BeautifulSoup

url = "https://eds.nd.edu/search/index.shtml"
post_url = "https://eds.nd.edu/cgi-bin/nd_ldap_search.pl"

res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
payload = {item['name']:item.get('value','') for item in soup.select('input[name]')}
payload['email'] = '[email protected]'
del payload['clear']

resp = requests.post(post_url,data=payload)
print(resp.content)

The above script is a faulty approach. However, I can't find any idea to grab the information connected to that email.

P.S. I'm not after selenium-oriented solution.

Upvotes: 0

Views: 57

Answers (1)

abdusco
abdusco

Reputation: 11101

Ok, solved it:

from urllib.parse import quote

import requests


def get_contact_html(email: str):
    encoded = quote('o=\"University of Notre Dame\", '
                    'st=Indiana, '
                    'c=US?displayName,edupersonaffiliation,ndTitle,ndDepartment,postalAddress,telephoneNumber,mail,searchGuide,labeledURI,'
                    'uid?'
                    'sub?'
                    f'(&(ndMail=*{email}*))')
    data = {
        "ldapurl": f'LDAP://directory.nd.edu:389/{encoded}',
        "ldaphost": "directory.nd.edu",
        "ldapport": '389',
        "ldapbase": 'o="University of Notre Dame", st=Indiana, c=US',
        "ldapfilter": f'(&(ndMail=*{email}*))',
        "ldapheadattr": "displayname",
        "displayformat": "nd",
        "ldapmask": "",
        "ldapscope": "",
        "ldapsort": "",
        "ldapmailattr": "",
        "ldapurlattr": "",
        "ldapaltattr": "",
        "ldapjpgattr": "",
        "ldapdnattr": "",
    }
    res = requests.post('https://eds.nd.edu/cgi-bin/nd_ldap_search.pl',
                        data=data)
    res.raise_for_status()
    return res.text


if __name__ == '__main__':
    html = get_contact_html('[email protected]')
    print(html)

output:

...
Formal Name:
...
Aaron D Frick
...

this will give you the HTML for the page. The trick was converting encoded spaces + to real spaces in "ldapbase": 'o="University of Notre Dame", st=Indiana, c=US', field and letting requests module to encode the value itself. Otherwise + signs get double encoded.

Upvotes: 1

Related Questions