ezeagwulae
ezeagwulae

Reputation: 299

scrape data from interactive map

I'm trying to get the data from each pop-up on the map. I've used beautifulsoup in the past but this is a first getting data from an interactive map.

Any push in the right direction is helpful. So far i'm returning blanks. Here's what i have, it isn't substantial...

from bs4 import BeautifulSoup as bs4
import requests

url = 'https://www.oaklandconduit.com/development_map'
r = requests.get(url).text
soup = bs4(r, "html.parser")
address = soup.find_all("div", {"class": "leaflet-pane leaflet-marker-pane"})

Updated On recommendations, I went with parsing the javascript content with re using the script below. But loading into json returns an error

import requests, re
url = 'https://ebrrd.nationbuilder.com/themes/3/58597f55b92871671e000000/0/attachments/14822603711537993218/default/mapscript.js'
r = requests.get(url).content
content = re.findall(r'var.*?=\s*(.*?);', r, re.DOTALL | re.MULTILINE)[2]
json_content = json.loads(content)

Upvotes: 2

Views: 4896

Answers (2)

ezeagwulae
ezeagwulae

Reputation: 299

Continued with regex to parse map contents into Json. Here's my approach with comments if helpful to others.

import re, requests, json
url = 'https://ebrrd.nationbuilder.com/themes/3/58597f55b92871671e000000/0/attachments/14822603711537993218/default' \
      '/mapscript.js'
r = requests.get(url).content
# use regex to get geoJSON and replace single quotes with double
content = re.findall(r'var geoJson.*?=\s*(.*?)// Add custom popups', r, re.DOTALL | re.MULTILINE)[0].replace("'", '"')
# add quotes to key: "type" and remove trailing tab from value: "description"
content = re.sub(r"(type):", r'"type":', content).replace('\t', '')
# remove ";" from dict
content = content[:-5]
json_content = json.loads(content)

also open to other pythonic approaches.

Upvotes: 1

Cole
Cole

Reputation: 1745

The interactive map is loaded through and driven by JavaScript, therefore, using the requests library is not going to be sufficient enough to get the data you want because it only gets you the initial response (in this case, HTML source code).

If you view the source for the page (on Chrome: view-source:https://www.oaklandconduit.com/development_map) you'll see that there is an empty div like so:

<div id='map'></div>

This is the placeholder div for the map.

You'll want to use a method that allows the map to load and for you to programmatically interact with it. Selenium can do this for you but will be significantly slower than requests because it has to allow for this interactivity by launching a programmatically driven browser.

Upvotes: 1

Related Questions