Reputation: 299
I'm trying to get the data from each pop-up on the map. I've used beautifulsoup in the past but this is a first getting data from an interactive map.
Any push in the right direction is helpful. So far i'm returning blanks. Here's what i have, it isn't substantial...
from bs4 import BeautifulSoup as bs4
import requests
url = 'https://www.oaklandconduit.com/development_map'
r = requests.get(url).text
soup = bs4(r, "html.parser")
address = soup.find_all("div", {"class": "leaflet-pane leaflet-marker-pane"})
Updated
On recommendations, I went with parsing the javascript content with re
using the script below. But loading into json returns an error
import requests, re
url = 'https://ebrrd.nationbuilder.com/themes/3/58597f55b92871671e000000/0/attachments/14822603711537993218/default/mapscript.js'
r = requests.get(url).content
content = re.findall(r'var.*?=\s*(.*?);', r, re.DOTALL | re.MULTILINE)[2]
json_content = json.loads(content)
Upvotes: 2
Views: 4896
Reputation: 299
Continued with regex to parse map contents into Json. Here's my approach with comments if helpful to others.
import re, requests, json
url = 'https://ebrrd.nationbuilder.com/themes/3/58597f55b92871671e000000/0/attachments/14822603711537993218/default' \
'/mapscript.js'
r = requests.get(url).content
# use regex to get geoJSON and replace single quotes with double
content = re.findall(r'var geoJson.*?=\s*(.*?)// Add custom popups', r, re.DOTALL | re.MULTILINE)[0].replace("'", '"')
# add quotes to key: "type" and remove trailing tab from value: "description"
content = re.sub(r"(type):", r'"type":', content).replace('\t', '')
# remove ";" from dict
content = content[:-5]
json_content = json.loads(content)
also open to other pythonic approaches.
Upvotes: 1
Reputation: 1745
The interactive map is loaded through and driven by JavaScript, therefore, using the requests
library is not going to be sufficient enough to get the data you want because it only gets you the initial response (in this case, HTML source code).
If you view the source for the page (on Chrome: view-source:https://www.oaklandconduit.com/development_map
) you'll see that there is an empty div like so:
<div id='map'></div>
This is the placeholder div for the map.
You'll want to use a method that allows the map to load and for you to programmatically interact with it. Selenium can do this for you but will be significantly slower than requests
because it has to allow for this interactivity by launching a programmatically driven browser.
Upvotes: 1