UnderConfident
UnderConfident

Reputation: 41

How do you find a url from a input button (web scraping)

I'm webscraping a asp.net website, and there is a input button that links to a page I need. I'm wondering how I can get the url to the site without using automation like Selenium.

Note: I don't need to scrape the actual page, the url contains all the info I need.

This is the code I used to get to the website but I don't know where to start with scraping the button url:

select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()

  session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
    
    session.post(select_session_url, session_payload, headers)


senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
    


session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)

page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
        member_soup = BeautifulSoup(page.text, 'lxml')
        member = member_soup.find_all('input', value='Jones')

The html for the button is below:

<input type="button" value="Jones" onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$47')" style="background-color:Transparent;border-color:Silver;border-style:Outset;font-size:Small;height:30px;width:100px;">

Upvotes: 0

Views: 717

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

How to find the inputs onclick?

You were close by but should replace your line with:

member_soup.find('input', {"value" : "Jones"})['onclick']

Example

import requests
from bs4 import BeautifulSoup
  
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()

session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)


senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)

page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')

member_soup = BeautifulSoup(page.text, 'lxml')

member = member_soup.find('input', {"value" : "Jones"})['onclick']
member

Output

"javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$39')"


Edit

You may interested how to start with selenium ...

from selenium import webdriver
from time import sleep


browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx')

sleep(0.9)
browser.find_element_by_link_text('Regular Session 2019').click()

sleep(0.9)
browser.find_element_by_link_text('Prefiled Bills').click()

sleep(2)
browser.find_element_by_css_selector('input[value="Senate"]').click()

sleep(2)
browser.find_element_by_css_selector('input[value="Jones"]').click()

sleep(2)
print(browser.current_url)

browser.close()

Output

http://alisondb.legislature.state.al.us/Alison/Member.aspx?SPONSOR=Jones&amp;BODY=1753&amp;SPONSOR_OID=100453

Upvotes: 1

Related Questions