Trinh Trong Anh
Trinh Trong Anh

Reputation: 35

Identify US county from from latitude and longitude using Python

I am using the codes below to identify US county. The data is taken from Yelp which provides lat/lon coordinate.

id latitude longitude
1 40.017544 -105.283348
2 45.588906 -122.593331
import pandas
df = pandas.read_json("/Users/yelp/yelp_academic_dataset_business.json", lines=True, encoding='utf-8')

# Identify county
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="http")
df['county'] = geolocator.reverse(df['latitude'],df['longitude'])

The error was "TypeError: reverse() takes 2 positional arguments but 3 were given".

Upvotes: 2

Views: 1997

Answers (2)

dominic muscatella
dominic muscatella

Reputation: 66

you could use the us census data, and geopandas.

imports

import urllib
import requests
from pathlib import Path
from zipfile import ZipFile
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
​

get geometry data as a geopandas dataframe

src = [
    {
        "name": "counties",
        "suffix": ".shp",
        "url": "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip",
    },
]
data = {}
print('gathering county data from census')
for s in src:
    f = Path.cwd().joinpath(urllib.parse.urlparse(s["url"]).path.split("/")[-1])
    if not f.exists():
        r = requests.get(s["url"],stream=True,)
        with open(f, "wb") as fd:
            for chunk in r.iter_content(chunk_size=128): fd.write(chunk)
​
    fz = ZipFile(f)
    fz.extractall(f.parent.joinpath(f.stem))
​
    data[s["name"]] = gpd.read_file(
        f.parent.joinpath(f.stem).joinpath([f.filename
                                            for f in fz.infolist()
                                            if Path(f.filename).suffix == s["suffix"]][0])
    ).assign(source_name=s["name"])
gdf = pd.concat(data.values()).to_crs("EPSG:4326")
​

Lockport Illinois coordinates

query_point = Point(-88.057510, 41.589401)

use geopandas contains() to filter the data

contains = gdf.contains(query_point)
data = gdf[contains]
print(data['NAME'])

prints 'Will'

link to documentation: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.contains.html

Upvotes: 0

hyperneutrino
hyperneutrino

Reputation: 5425

Nominatim.reverse takes coordinate pairs; the issue is that you are passing it pandas dataframe columns. df['latitude'] here refers to the entire column in your data, not just one value, and since geopy is independent of pandas, it doesn't support processing an entire column and instead just sees that the input isn't a valid number.

Instead, try looping through the rows:

county = []

for row in range(len(df)):
    county.append(geolocator.reverse((df['latitude'][row], df['longitude'][row])))

(Note the double brackets.)

Then, insert the column into the dataframe:

df.insert(index, 'county', county, True)

(index should be what column position you want, and the boolean value at the end indicates that duplicate values are allowed.)

Upvotes: 1

Related Questions