Log In and Web Scrape with Python 3 but action='#' and possibly Java script

Question

I am trying to us Python 3 to scrape my data from Ancestry.com using Beautifulsoup and Mechanicalsoup but I am running into a few issues trying to log in. Here is the form's HTML on Ancestry:


            
                
                    
                        Email or Username
                    
                    
                
                
                    
                        Password
                     [event]

The HTML form for the site uses action='#', which I've found means that inputs are submitted into the current webpage. Additionally, I see an [event], which states 'event listener', and I think this implies Java Script? If so, do I need a separate import tool to log in?

Beautifulsoup cannot find the first form (of two forms). The second form has action="" which does appear.

from urllib.request import urlopen
# specify the url
quote_page = 'https://www.ancestry.com/account/signin?'
# query the website and return the html to the variable ‘page’
page= urlopen(quote_page)

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')
len(soup.find_all('form')) #Out: 1

How can I interact with form 1? When I use browser.select_form('form[action="#"]') I get the error LinkNotFoundError. My code:

#import urllib.request
#import time
#pip install beautifulsoup4
#from bs4 import BeautifulSoup
#%pip install mechanicalsoup
#import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()
browser.open('https://www.ancestry.com/account/signin?')
print(browser.get_url())

#browser.select_form('')
###action="#" id="signInForm"
#browser.select_form('form[action="#" id="signInForm"]')
#browser.select_form('form[action="#"]')   #gives LinkNotFound error
browser.select_form('form[action=""]')


browser['username']='USERNAME'
browser['password']='PASSWORD'

browser.submit_selected()
print(browser.get_url())

I see a lot of support using mechanize but that does not work for Python 3. I do not know how to check if Ancestry.com is using Java or not, because I can't engage the first form. I am a beginner, so please assume I know nothing, and I won't be offended. (I haven't found a tutorial with action='#' because that query returns few results)

(This person used a different strategy to log into Ancestry, but the site has updated since this code was posted https://github.com/freeseek/getmydnamatches/blob/master/getmyancestrydna.py His code is a little too advanced for me, at my level.)

Log In and Web Scrape with Python 3 but action='#' and possibly Java script

Answers (1)

Related Questions

Log In and Web Scrape with Python 3 but action=&#39;#&#39; and possibly Java script

Answers (1)

Related Questions

Log In and Web Scrape with Python 3 but action='#' and possibly Java script