Reputation: 41
I'm webscraping a asp.net website, and there is a input button that links to a page I need. I'm wondering how I can get the url to the site without using automation like Selenium.
Note: I don't need to scrape the actual page, the url contains all the info I need.
This is the code I used to get to the website but I don't know where to start with scraping the button url:
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find_all('input', value='Jones')
The html for the button is below:
<input type="button" value="Jones" onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$47')" style="background-color:Transparent;border-color:Silver;border-style:Outset;font-size:Small;height:30px;width:100px;">
Upvotes: 0
Views: 717
Reputation: 25048
onclick
?You were close by but should replace your line with:
member_soup.find('input', {"value" : "Jones"})['onclick']
Example
import requests
from bs4 import BeautifulSoup
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find('input', {"value" : "Jones"})['onclick']
member
Output
"javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$39')"
You may interested how to start with selenium ...
from selenium import webdriver
from time import sleep
browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx')
sleep(0.9)
browser.find_element_by_link_text('Regular Session 2019').click()
sleep(0.9)
browser.find_element_by_link_text('Prefiled Bills').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Senate"]').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Jones"]').click()
sleep(2)
print(browser.current_url)
browser.close()
Output
http://alisondb.legislature.state.al.us/Alison/Member.aspx?SPONSOR=Jones&BODY=1753&SPONSOR_OID=100453
Upvotes: 1