robots.txt
robots.txt

Reputation: 137

Unable to parse a number to be used within a link

I've created a script in python to get the value of Tax District from a webpage. In it's main page there is a form to fill in to generate the result in which the information i'm looking for is available. When I use my script below, I get the desired result but the thing is I had to use different link to parse the result. The link I used within my script is available only when the form is filled in. The newly generated link (which I've used within my script) has some number which I can't figure out how to find that.

Main link

In the search form there is a radio button Street Address which is selected by default. Then:-

house number: 5587 (just above Exact/Low)
street name: Surrey

This is the link https://wedge.hcauditor.org/view/re/5500171005200/2018/summary generating automatically which has some number 5500171005200 within it.

I've written the following script to get the result but really don't know how the number in that url generates as the number changes when I use different search terms:

import requests
from bs4 import BeautifulSoup

url = 'https://wedge.hcauditor.org/view/re/5500171005200/2018/summary'

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("div:contains('Tax District') + div").text
print(item)

How can I get the number used within the newly generated link?

Upvotes: 2

Views: 52

Answers (1)

QHarr
QHarr

Reputation: 84465

Seems a POST and GET is fine. No need to look for that other number. I use Session to pass cookies. The link you reference however is found within the GET response.

import requests
from bs4 import BeautifulSoup as bs

data = {
    'search_type': 'Address',
    'sort_column': 'Address',
    'site_house_number_low':5587,
    'site_house_number_high':'',
    'site_street_name': 'surrey'  
}

with requests.Session() as s:
    r = s.post('https://wedge.hcauditor.org/execute', data = data)
    r = s.get('https://wedge.hcauditor.org/view_result/0')
    soup = bs(r.content,'lxml')
    print(soup.select_one('.label + div').text)

You can see the details and sequence captured in the web traffic. I happened to use fiddler here.

enter image description here

Upvotes: 2

Related Questions