Reputation: 437
I want to generate a random set of latitude and longitude coordinates in the US (including Hawaii and Alaska). I tried using a shapefile from the National Weather Service (https://www.weather.gov/gis/USstates ) but it was generating points in the middle of the ocean. What is the best way of doing this? I thought about defining my own polygon in the interior US but that would exclude some states. I’ve also seen other similar questions where they used a CSV list of US cities, but I’d rather it be completely random.
Upvotes: 2
Views: 1560
Reputation: 10203
This one requires geopandas
but it's a quick and standard solution for sampling within odd shapes (called Monte Carlo Sampling ). Most of the comments below question outline the same concept.
# grab shape within which to sample
url = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_nation_20m.zip"
us = gpd.read_file(url).explode()
## filter out parts of the US that are far away from mainland, I have no idea what they are (Guam islands?)
us = us.loc[us.geometry.apply(lambda x: x.exterior.bounds[2])<-60]
# grab bounding box within which to generate random numbers
x_min,y_min,x_max,y_max = us.geometry.unary_union.bounds
# the sampling
np.random.seed(2) # set seed (needed for reproducible results
N = 10000
rndn_sample = pd.DataFrame({'x':np.random.uniform(x_min,x_max,N),'y':np.random.uniform(y_min,y_max,N)}) # actual generation
# re-save results in a geodataframe
rndn_sample = gpd.GeoDataFrame(rndn_sample, geometry = gpd.points_from_xy(x=rndn_sample.x, y=rndn_sample.y),crs = us.crs)
# filtering
inUS = rndn_sample['geometry'].apply(lambda s: s.within(us.geometry.unary_union)) # check if within the U.S. bounds
rndn_sample.loc[inUS,:].plot() # plot for visual inspection of results
# grab shapefile of the US from an official source
url = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_nation_20m.zip"
us = gpd.read_file(url).explode()
Note, with explode()
, I expand the multi-part polygon into separate rows. This enables for easier filtering of the area we are interested in because we can grab bounds for each part of the multi-part polygon as below. Note that -60
is just an approximate longitude of the most eastern part of mainland US (Puerto Rico). Feel free to decrease it to exclude PR
## filter out parts of the US that are far away from mainland, I have no idea what they are (Guam islands?)
us = us.loc[us.geometry.apply(lambda x: x.exterior.bounds[2])<-60]
# grab bounding box within which to generate random numbers
x_min,y_min,x_max,y_max = us.geometry.unary_union.bounds # save min and max x/y coords
Note, unary_union
is used to re-combine the individual rows into a single multi-part polygon and the bounds
is used to grab the min/max of the x & y coordinates on the filtered subset of the U.S. (ie without guam islands)
np.random.seed(2) # set seed (needed for reproducible results
N = 10000
rndn_sample = pd.DataFrame({'x':np.random.uniform(x_min,x_max,N),'y':np.random.uniform(y_min,y_max,N)}) # actual generation
# re-save results in a geodataframe
rndn_sample = gpd.GeoDataFrame(rndn_sample, geometry = gpd.points_from_xy(x=rndn_sample.x, y=rndn_sample.y),crs = us.crs)
inUS = rndn_sample['geometry'].apply(lambda s: s.within(us.geometry.unary_union)) # check if within the U.S. bounds
rndn_sample.loc[inUS,:].plot() # plot for visual inspection of results
Btw, here are the needed libraries in case it's ambiguous
# load libraries
import pandas as pd
import geopandas as gpd
import numpy as np
Upvotes: 7