Reputation: 1
I am trying to us Python 3 to scrape my data from Ancestry.com using Beautifulsoup and Mechanicalsoup but I am running into a few issues trying to log in. Here is the form's HTML on Ancestry:
<form action="#" id="signInForm" method="post" class="form formLarge" onsubmit="return false" novalidate="novalidate" data-ui-id="ui1591467547206308">
<div class="ancGrid">
<div class="ancCol ancColRow w100">
<label id="usernameLabel" for="username" data-error-0="Required" data-error-1="Please enter a minimum of 5 characters for the username/email" data-error-2="Username/email contains invalid characters">
Email or Username
</label>
<input tabindex="1" aria-required="true" class="success required" id="username" maxlength="64" name="username" placeholder="Email Address or Username" type="text" value="" autocorrect="off" autocapitalize="off">
</div>
<div class="ancCol ancColRow w100">
<label id="passwordLabel" for="password" data-error-0="Required" data-error-1="Please enter a minimum of 5 characters for the password" data-error-2="Password contains invalid characters">
Password
</label> [event]
Beautifulsoup cannot find the first form (of two forms). The second form has action="" which does appear.
from urllib.request import urlopen
# specify the url
quote_page = 'https://www.ancestry.com/account/signin?'
# query the website and return the html to the variable ‘page’
page= urlopen(quote_page)
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')
len(soup.find_all('form')) #Out: 1
How can I interact with form 1? When I use browser.select_form('form[action="#"]')
I get the error LinkNotFoundError. My code:
#import urllib.request
#import time
#pip install beautifulsoup4
#from bs4 import BeautifulSoup
#%pip install mechanicalsoup
#import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('https://www.ancestry.com/account/signin?')
print(browser.get_url())
#browser.select_form('')
###action="#" id="signInForm"
#browser.select_form('form[action="#" id="signInForm"]')
#browser.select_form('form[action="#"]') #gives LinkNotFound error
browser.select_form('form[action=""]')
browser['username']='USERNAME'
browser['password']='PASSWORD'
browser.submit_selected()
print(browser.get_url())
I see a lot of support using mechanize but that does not work for Python 3. I do not know how to check if Ancestry.com is using Java or not, because I can't engage the first form. I am a beginner, so please assume I know nothing, and I won't be offended. (I haven't found a tutorial with action='#' because that query returns few results)
(This person used a different strategy to log into Ancestry, but the site has updated since this code was posted https://github.com/freeseek/getmydnamatches/blob/master/getmyancestrydna.py His code is a little too advanced for me, at my level.)
Upvotes: 0
Views: 545
Reputation: 199
Please, consider taking a look at this: https://requests.readthedocs.io/projects/requests-html/en/latest/
It's very friendly and has javascript support.
Upvotes: 0