Reputation: 2219
I have a dataframe that has two columns, Hospital name and Address, and I want to iterate through each address to find the latitude and longitude. My code seems to be taking the first row in the dataframe and I can't seem to select the address to find the coordinates.
import pandas
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for index, item in df.iterrows():
location = geolocator.geocode(item)
df["Latitude"].append(location.latitude)
df["Longitude"].append(location.longitude)
Here is the code I used to scrape the website. Copy and run this and you'll have the data set.
import requests
from bs4 import BeautifulSoup
import pandas
import numpy as np
r=requests.get("https://www.privatehealth.co.uk/hospitals-and-
clinics/orthopaedic-surgery/?offset=300")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all(["div"],{"class":"col-9"})
names = []
for item in all:
d={}
d["Hospital Name"] = item.find(["h3"],{"class":"mb6"}).text.replace("\n","")
d["Address"] = item.find(["p"],{"class":"mb6"}).text.replace("\n","")
names.append(d)
df=pandas.DataFrame(names)
df = df[['Hospital Name','Address']]
df
Currently the data looks like (one hospital example):
Hospital Name |Address
Fulwood Hospital|Preston, PR2 9SZ
The final output that I'm trying to achieve looks like.
Hospital Name |Address | Latitude | Longitude
Fulwood Hospital|Preston, PR2 9SZ|53.7589938|-2.7051618
Upvotes: 1
Views: 3244
Reputation: 21274
Seems like there are a few issues here. Using data from the URL you provided:
df.head()
Hospital Name Address
0 Fortius Clinic City London, EC4N 7BE
1 Pinehill Hospital - Ramsay Health Care UK Hitchin, SG4 9QZ
2 Spire Montefiore Hospital Hove, BN3 1RD
3 Chelsea & Westminster Hospital London, SW10 9NH
4 Nuffield Health Tunbridge Wells Hospital Tunbridge Wells, TN2 4UL
(1) If your data frame column names really are Hospital name
and Address
, then you need to use item.Address
in the call to geocode()
.
Just using item
will give you both Hospital name
and Address
.
for index, item in df.iterrows():
print(f"index: {index}")
print(f"item: {item}")
print(f"item.Address only: {item.Address}")
# Output:
index: 0
item: Hospital Name Fortius Clinic City
Address London, EC4N 7BE
Name: 0, dtype: object
item.Address only: London, EC4N 7BE
...
(2) You noted that your data frame only has two columns. If that's true, you'll get a KeyError
when you try to perform operations on df["Latitude"]
and df["Longitude"]
, because they don't exist.
(3) Using apply()
on the Address
column might be clearer than iterrows()
.
Note that this is a stylistic point, and debatable. (The first two points are actual errors.)
For example, using the provided URL:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
tmp = df.head().copy()
latlon = tmp.Address.apply(lambda addr: geolocator.geocode(addr))
tmp["Latitude"] = [x.latitude for x in latlon]
tmp["Longitude"] = [x.longitude for x in latlon]
Output:
Hospital Name Address \
0 Fortius Clinic City London, EC4N 7BE
1 Pinehill Hospital - Ramsay Health Care UK Hitchin, SG4 9QZ
2 Spire Montefiore Hospital Hove, BN3 1RD
3 Chelsea & Westminster Hospital London, SW10 9NH
4 Nuffield Health Tunbridge Wells Hospital Tunbridge Wells, TN2 4UL
Latitude Longitude
0 51.507322 -0.127647
1 51.946413 -0.279165
2 50.840871 -0.180561
3 51.507322 -0.127647
4 51.131528 0.278068
Upvotes: 3