Reputation: 164
I've created a script in python using requests module to fetch some information displayed upon filling in a form using this email [email protected]
. The problem is when I hit the search button, I can see a new tab containing all the information I wish to grab. Moreover, I don't see any link in the All
tab under Network
section within chrome dev tools. So, I'm hopeless as to how I can get the information using requests module.
Steps to populate the result manually:
Put this email address
[email protected]
next to the inputbox ofEmail address
and hit theSearch
button.
I've tried with:
import requests
from bs4 import BeautifulSoup
url = "https://eds.nd.edu/search/index.shtml"
post_url = "https://eds.nd.edu/cgi-bin/nd_ldap_search.pl"
res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
payload = {item['name']:item.get('value','') for item in soup.select('input[name]')}
payload['email'] = '[email protected]'
del payload['clear']
resp = requests.post(post_url,data=payload)
print(resp.content)
The above script is a faulty approach. However, I can't find any idea to grab the information connected to that email.
P.S. I'm not after selenium-oriented solution.
Upvotes: 0
Views: 57
Reputation: 11101
Ok, solved it:
from urllib.parse import quote
import requests
def get_contact_html(email: str):
encoded = quote('o=\"University of Notre Dame\", '
'st=Indiana, '
'c=US?displayName,edupersonaffiliation,ndTitle,ndDepartment,postalAddress,telephoneNumber,mail,searchGuide,labeledURI,'
'uid?'
'sub?'
f'(&(ndMail=*{email}*))')
data = {
"ldapurl": f'LDAP://directory.nd.edu:389/{encoded}',
"ldaphost": "directory.nd.edu",
"ldapport": '389',
"ldapbase": 'o="University of Notre Dame", st=Indiana, c=US',
"ldapfilter": f'(&(ndMail=*{email}*))',
"ldapheadattr": "displayname",
"displayformat": "nd",
"ldapmask": "",
"ldapscope": "",
"ldapsort": "",
"ldapmailattr": "",
"ldapurlattr": "",
"ldapaltattr": "",
"ldapjpgattr": "",
"ldapdnattr": "",
}
res = requests.post('https://eds.nd.edu/cgi-bin/nd_ldap_search.pl',
data=data)
res.raise_for_status()
return res.text
if __name__ == '__main__':
html = get_contact_html('[email protected]')
print(html)
output:
...
Formal Name:
...
Aaron D Frick
...
this will give you the HTML for the page.
The trick was converting encoded spaces +
to real spaces in
"ldapbase": 'o="University of Notre Dame", st=Indiana, c=US',
field and letting requests
module to encode the value itself. Otherwise +
signs get double encoded.
Upvotes: 1