Reputation: 1286
I'm analyzing tweets and need to find which state (in the USA) the user was in from their GPS coordinates. I will not have an internet connection available so I can't use an online service such as the Google Maps API to reverse geocode.
Does anyone have any suggestions? I am writing the script in python so if anyone knows of a python library that I can use that would be great. Or if anyone can point me to a research paper or efficient algorithm I can implement to accomplish this that would also be very helpful. I have found some data that represents the state boundaries in GPS coordinates but I can't think of an efficient way to determine which state the user's coordinates are in.
Upvotes: 3
Views: 3756
Reputation: 141
This is a bit late, but could be useful for some. It is a solution in python. Go to this [link] (https://eric.clst.org/tech/usgeojson/) and download a geojson file for the US states.
Then try this.
Import the packages and load in the data from the GeoJSON
import json
import pandas as pd
from shapely.geometry import Polygon, Point, MultiPolygon
data = json.load(open('GeoJson/gz_2010_us_040_00_20m.json'))
df = pd.DataFrame(data["features"])
Extract the required field from the GeoJSON
df['Location'] = df['properties'].apply(lambda x: x['NAME'])
df['Type'] = df['geometry'].apply(lambda x: x['type'])
df['Coordinates'] = df['geometry'].apply(lambda x: x['coordinates'])
Create Polygon or MultiPolygon objects depending on the States type.
df_new = pd.DataFrame()
for idx, row in df.iterrows():
if row['Type'] == 'MultiPolygon':
list_of_polys = []
df_row = row['Coordinates']
for ll in df_row:
list_of_polys.append(Polygon(ll[0]))
poly = MultiPolygon(list_of_polys)
elif row['Type'] == 'Polygon':
df_row = row['Coordinates']
poly = Polygon(df_row[0])
else:
poly = None
row['Polygon'] = poly
df_new = df_new.append(row)
Drop columns we don't need
df_selection = df_new.drop(columns=['type', 'properties', 'geometry','Coordinates'] )
Feed in an example lat and long and see the results - if its not right first time, switch your lat and long ;-)
point = Point(-81.47, 27.494) #Example GPS location for somewhere in Florida
state = df_selection.apply(lambda row: row['Location'] if row['Polygon'].contains(point) else None, axis=1).dropna()
print(state)
Upvotes: 1
Reputation: 1317
Looking at the shape of the states on a logitude/latitude map, it becomes obvious that probably 70% of the boundaries are aligned with long/lat axes. Others follow very linear or near linear paths. It seems like a "well crafted" bsp tree should be the fastest way to decide which state a location is in.
The definition of "well crafted" is hard to establish, but I'd suggest that you try to balance eliminating states (whole of a state is on side A or B of this line) with quickly isolating large population centers. Ideally if you have to subdivide a state with a line try to do so such that the large population centers are on one side of the line.
Including population dispersal in your boundary creation should improve your average case times. Given that quite a few states have boundaries that follow rivers, some of the longest paths in your tree will probably be very deep, but you should be saving significant time over checking each state individually.
Upvotes: 1
Reputation: 5304
Use a point-in-polygon algorithm to determine if the coordinate is inside of a state (represented by a polygon with GAP coordinates as points). Practically speaking, it doesn't seem like you would be able to improve much upon simply checking each state one at a time, though some optimizations can be made if it's too slow.
However, parts of Alaska are on both sides of the 180th meridian which cases problems. One solution would be to offset the coordinates a bit by adding 30 degrees modulus 180 to the longitude for each GPS coordinate (user coordinates and state coordinates). This has the effect of moving the 180th meridian about 30 degrees west and should be enough to ensure that the entire US on one side of the 180th meridian.
Upvotes: 3