Kelvin Njeri
Kelvin Njeri

Reputation: 159

Scraping data from a site that has no form tag but a text input using python

I'm working on a python program to scrape data from here. I've had successes before but this one poses a challenge for me. I am using beautiful soup and mechanize. I need to be able to input a zip code in the text box to produce the results here's I'm after.

Here's the snippet that contains the input text box:

<div id="ContentPlaceHolder1_C001_pnlFindACenter" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ContentPlaceHolder1_C001_btnSearchClient')">
		
        <div style="width: 400px; float: left; padding-top: 5px;">
            <label for="ContentPlaceHolder1_C001_tbUserAddress" style="font-family: Arial; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-decoration: none; text-transform: none; color: rgb(0, 0, 0); cursor: auto; display: inline-block; position: relative; z-index: 100; margin-right: -121px; left: 2px; top: 0px; opacity: 1;">Address, City or Zip:</label><input name="ctl00$ContentPlaceHolder1$C001$tbUserAddress" type="text" id="ContentPlaceHolder1_C001_tbUserAddress" class="textInField" style="width: 240px; background-image: url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAPhJREFUOBHlU70KgzAQPlMhEvoQTg6OPoOjT+JWOnRqkUKHgqWP4OQbOPokTk6OTkVULNSLVc62oJmbIdzd95NcuGjX2/3YVI/Ts+t0WLE2ut5xsQ0O+90F6UxFjAI8qNcEGONia08e6MNONYwCS7EQAizLmtGUDEzTBNd1fxsYhjEBnHPQNG3KKTYV34F8ec/zwHEciOMYyrIE3/ehKAqIoggo9inGXKmFXwbyBkmSQJqmUNe15IRhCG3byphitm1/eUzDM4qR0TTNjEixGdAnSi3keS5vSk2UDKqqgizLqB4YzvassiKhGtZ/jDMtLOnHz7TE+yf8BaDZXA509yeBAAAAAElFTkSuQmCC&quot;); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" data-hasqtip="21" oldtitle="Address, City or Zip:" title="" autocomplete="off" aria-describedby="qtip-21">
            <div id="divDistance" style="display: inline;">
                &nbsp;&nbsp;within&nbsp;&nbsp;
            <select name="ctl00$ContentPlaceHolder1$C001$ddlRadius" id="ContentPlaceHolder1_C001_ddlRadius">
			<option value="5">5</option>
			<option value="10">10</option>
			<option selected="selected" value="25">25</option>
			<option value="50">50</option>
			<option value="100">100</option>

		</select>
                miles
            </div>
        </div>
        <div style="width: 160px; float: left;">
            &nbsp;&nbsp;&nbsp;
            <input type="submit" name="ctl00$ContentPlaceHolder1$C001$btnSearchClient" value="Search" onclick="GeocodeLocation();return false;WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$C001$btnSearchClient&quot;, &quot;&quot;, false, &quot;&quot;, &quot;find-a-center&quot;, false, false))" id="ContentPlaceHolder1_C001_btnSearchClient" class="btnCenter">
        </div>
        <div style="clear: both;">
        </div>
        <div>
            <span onchange="" style="font-size:12px;display: inline;" data-hasqtip="22" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-22"><input id="ContentPlaceHolder1_C001_chkSleepTM" type="checkbox" name="ctl00$ContentPlaceHolder1$C001$chkSleepTM"><label for="ContentPlaceHolder1_C001_chkSleepTM">Only show AASM SleepTM capable sleep centers in my state</label></span>
            <a href="https://sleeptm.com/" style="font-size: 10px; margin-left: 10px; display: inline;" target="_blank" data-hasqtip="23" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-23">What is AASM SleepTM?</a>
        </div>
    
	</div>

So far these are my attempts

url = 'http://www.sleepeducation.org/find-a-facility'
MILES = '100'
CODE = '33060'

attempt one

first = urllib2.Request(url,
                   data=urllib.urlencode({'value': CODE}),
                   headers={'User-Agent' : 'Google Chrome'                             'Cookie': 'name = ctl00$ContentPlaceHolder1$C001$tbUserAddress'})

attempt two

post_params = {
       'ctl00$ContentPlaceHolder1$C001$tbUserAddress': CODE
}
first = urllib.urlencode(post_params)

driver = webdriver.Chrome()
driver.get(url)
sbox = driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$tbUserAddress")
sbox.send_keys(CODE)
        driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$btnSearchClient").click()

attempt 3

br = mechanize.Browser()
br.open(url)
br.select_form(name='ctl00$ContentPlaceHolder1$C001$tbUserAddress')
br['value'] = CODE
br.submit()

http = urllib2.urlopen(br.response())
soup = BeautifulSoup(http, "html5lib")

Error = "no form matching name 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'"


attempt 4

soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'})['value'] = CODE
soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$btnSearchClient'}).click()

Upvotes: 1

Views: 2001

Answers (2)

Kelvin Njeri
Kelvin Njeri

Reputation: 159

This worked for me

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.sleepeducation.org/find-a-facility'
subButton = 'ContentPlaceHolder1_C001_btnSearchClient'
addyName = 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'
addyId = 'ContentPlaceHolder1_C001_tbUserAddress'

def usingChromeSelenium():
    driver = webdriver.Chrome('C:\Users\documents\chromedriver.exe')
    driver.get(url)
    sleep(1)
    driver.find_element_by_name(addyName).send_keys(CODE)
    driver.find_element_by_id(subButton).click()
    sleep(1)
    html = driver.page_source
    return html

results = usingChromeSelenium()
soup = BeautifulSoup(results, "html.parser")

for "webdriver.Chrome()" you have to download chrome.exe application file and include the path to the file within the parenthesis and it might work for you without the path

Upvotes: 0

Bohdan Biloshytskiy
Bohdan Biloshytskiy

Reputation: 405

If i correctly understand your question, you want to send request, with specific params, and check response. Okay, lets look what request where sent after submit. Lets open Postman.Post request params

As we can see ctl00$ContentPlaceHolder1$C001$tbUserAddress get value 100, and ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius, ctl00$ContentPlaceHolder1$C001$ddlRadius, ctl00$cphTopBar$T917BC451013$rblRadius got radius value 25.

So let's get a little snippet with data to send post request and get required response

I use python requests

And lxml to parse html response

I prefere lxml, it's a harder to understand but much faster than BeautifulSoup.

import requests
from lxml import html

input_data = {
    'ctl00$cphTopBar$T917BC451013$rblRadius': 25,
    'ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$tbUserAddress': 100
}
resp = requests.post('http://www.sleepeducation.org/find-a-facility', data=input_data)
tree = html.fromstring(resp.text)
print(tree.xpath('//div[@id="ContentPlaceHolder1_C001_map_canvas"]')[0])

I dont have enough reputation to put links to documentation, i'll try to put them in comments, or you can just google python requests and python lxml Also U can do it with BeautifulSoup

import BeautifulSoup
import requests

input_data = {
        'ctl00$cphTopBar$T917BC451013$rblRadius': 25,
        'ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius': 25,
        'ctl00$ContentPlaceHolder1$C001$ddlRadius': 25,
        'ctl00$ContentPlaceHolder1$C001$tbUserAddress': 100
    }
resp = requests.post('http://www.sleepeducation.org/find-a-facility', data=input_data)
soup = BeautifulSoup.BeautifulSoup(resp.text)
soup.find("div", {"id": "ContentPlaceHolder1_C001_map_canvas"})

Upvotes: 1

Related Questions