Julien Bourbon
Julien Bourbon

Reputation: 41

Get geolocation from webpage by scraping using Selenium

I try to scrape this page :

The goal is to collect the latitude and the longitude. However I see that the HTML content don't change after any submit in "Adresse" case, and I don't know if this is the problem of my "empty list".

My script :

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.coordonnees-gps.fr/"

chrome_path = r"C:\Users\jbourbon\Desktop\chromedriver_win32 (1)\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get(url)

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

g_data = soup.findAll("div", {"class": "col-md-9"})

latitude = []
longitude = []

for item in g_data:
    latitude.append(item.contents[0].findAll("input", {"id": "latitude"}))
    longitude.append(item.contents[0].findAll("input", {"id": "longitude"}))

print(latitude)
print(longitude)

And here, what I have with my list

Ty :)

Upvotes: 1

Views: 958

Answers (2)

Ashish Ranjan
Ashish Ranjan

Reputation: 5543

There's nothing wrong with what you're doing, the problem is that when you open the link using selenium or requests the geolocation is not available instantly, it's available after few seconds (aaf9ec0.js adds it to the html dynamically, so request won't work anyway), also seems like input#latitude is also not giving the values, you can get it from div#info_window.

I've modified the code a bit, it should get you the lat, long, works for me:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import re

url = "https://www.coordonnees-gps.fr/"

driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
time.sleep(2) # wait for geo location to become ready

# no need to fetch it again using requests, we've already done it using selenium above
#r = requests.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")

g_data = soup.findAll("div", {"id": "info_window"})[0]
#extract latitude, longitutde
latitude, longitude = re.findall(r'Latitude :</strong> ([\.\d]*) \| <strong>Longitude :</strong> ([\.\d]*)<br/>', str(g_data))[0]
print(latitude)
print(longitude)

Output

25.594095
85.137565

Upvotes: 1

zamber
zamber

Reputation: 938

My best guess is that you have to enable GeoLocation for the chromedriver instance. See this answer for a lead on how to do that.

Upvotes: 0

Related Questions