Jake
Jake

Reputation: 313

Web Scraping Pop-Up

I am new to Web Scraping and I am trying to automate retrieving parcel information from a town website. I have over 300 parcels I need the book and page number for.

This is the website: https://newmilfordct.mapgeo.io/datasets/properties?abuttersDistance=100&latlng=41.587864%2C-73.425014

When you go there you can click on search and then I would put in the identifier (example would be 68/20). I have a list of all these. From there the profile comes up and I can get the book and page number.

This is what I have so far

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = "https://newmilfordct.mapgeo.io/datasets/properties?abuttersDistance=100&latlng=41.587864%2C-  73.425014"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

I connect to the site but I cannot figure out how to interact with it. If someone could please assist me in the right direction it would be greatly appreciated and hours of work saved by hand.

Upvotes: 2

Views: 87

Answers (1)

baduker
baduker

Reputation: 20042

You can get the data for the given identifier by sending a POST request to the API url.

Here's how to do it:

import requests

search_url = "https://newmilfordct.mapgeo.io/api/datasets/properties/search?format=json"

identifier = "68/20"

payload = {
    "page": 1,
    "quickSearch": identifier
}

search_results = requests.post(search_url, payload).json()
# print(search_results)

for item in search_results:
    name = item['displayName']
    owner = item['ownerName']
    geometry = item['geometry']
    book = item['lastSaleBook']
    page = item['lastSalePage']
    print(f"Name: {name} | Owner: {owner}")
    print(f"Book/Page: {book}/{page}")
    print(geometry)
    print("-" * 80)

Output:

Name: 17 BUCKINGHAM LN | Owner: ROTELLI LOUIS
Book/Page: 0970/230
{"type":"Polygon","coordinates":[[[-73.4909038060549,41.6425898231357],[-73.4909821900848,41.6425591025291],[-73.4907493168393,41.6419510845828],[-73.4911769908149,41.6420353877],[-73.4915429751214,41.6418889484739],[-73.4915515509607,41.6418998161938],[-73.4919447199921,41.6423992451082],[-73.4920405021311,41.6425204818934],[-73.4919930203487,41.6425307775562],[-73.4919273071398,41.6425305146988],[-73.4917614178846,41.642552550643],[-73.491595684262,41.642581803258],[-73.4910018358319,41.6426901884681],[-73.4910019510053,41.6427258656192],[-73.4909038060549,41.6425898231357]]]}
--------------------------------------------------------------------------------
Name: 15 BUCKINGHAM LN | Owner: NEELANDS DOUGLAS S + SALOME S
Book/Page: 0330/394
{"type":"Polygon","coordinates":[[[-73.4904204439222,41.6413365201908],[-73.4908759926496,41.6411167792846],[-73.4909181970441,41.6410961714263],[-73.4915429751214,41.6418889484739],[-73.4911769908149,41.6420353877],[-73.4907493168393,41.6419510845828],[-73.4909821900848,41.6425591025291],[-73.4909038060549,41.6425898231357],[-73.4904204439222,41.6413365201908]]]}
--------------------------------------------------------------------------------

There's much more in the JSON. Just uncomment this line # print(search_results) to get the entire response.

EDIT: A short note on the API.

You can take a sneak-peek at what happens when you drop the identifier into the search field in the Developer Tool in your web browser. Then go to the Network tab and select the XHR filter.

Select the first item and choose Headers. There you will find the Request URL and the Request payload.

Upvotes: 2

Related Questions