Reputation: 780

Resample points in a Geodataframe so they stay equally spaced but inside area

I have a geopandas dataframe that contains a complex area and some Point objects representing latitude and longitue inside that same area, both reading from .kml and .xlsx files, not defined by me. Thing is that those points are really close to each other, and when ploting the whole map they overlap, making it really difficult to spot each one individually, specially when using geoplot.kdeplot()

So what I would like to do is to find some way to equally space them, respecting the boundaries of my area. For sake of simplicity, consder the Polygon object as the area and the Point objects as the ones to resample:

import random
import matplotlib.pyplot as plt

y = np.array([[100,100], [200,100], [200,200], [100,200], [50.0,150]])
p = Polygon(y)

points = []
for i in range(100):
    pt = Point(random.randint(50,199),random.randint(100,199))
    if p.contains(pt):
        points.append(pt)

xs = [point.x for point in points]
ys = [point.y for point in points]

fig = plt.figure()
ax1=fig.add_subplot(111)
x,y = p.exterior.xy
plt.plot(x,y)
ax1.scatter(xs, ys)
plt.show()

That gives something like this:

Any ideas on how to do it witout crossing the area? Thank you in advance!

EDIT:

Important to mention that the resample should not be arbitrary, meaning that a point in coordinates (100,100), for example, should be somewhere near its original location.

Upvotes: 2

Answers (2)

Ali_Sh

Reputation: 2816

This kind of problems can be handled in various ways such as force based method or geometrical method; Here, geometrical one is used.

I have considered the points to be as circles, so an arbitrary diameter can be specified for them, and also the area size for plotting with matplotlib. So we have the following codes for the beginning:

import numpy as np
from scipy.spatial import cKDTree
from shapely.geometry.polygon import Polygon
import matplotlib.pyplot as plt

np.random.seed(85)

# Points' coordinates and the specified diameter
coords = np.array([[3, 4], [7, 8], [3, 3], [1, 8], [5, 4], [3, 5], [7, 7]], dtype=np.float64)
points_dia = 1.1             # can be chosen arbitrary based on the problem

# Polygon creation
polygon_verts = np.array([[0, 0], [8.1, 0], [8.1, 8.1], [0, 8.1]])
p = Polygon(polygon_verts)

# Plotting
x, y = coords.T
colors = np.random.rand(coords.shape[0])
area = 700                   # can be chosen arbitrary based on the problem
min_ = coords.min() - 2*points_dia
max_ = coords.max() + 2*points_dia
plt.xlim(min_, max_)
plt.ylim(min_, max_)
xc, yc = p.exterior.xy
plt.plot(xc, yc)
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

I separate the total answer to two steps:

dispersing (moving away) the suspected points from average position coordinates
curing the point-polygon overlaps

First step

For the first part, based on the number of points and neighbors, we can choose between Scipy and other libraries e.g. Scikit learn (my last explanations on this topic 1 (4th-5th paragraphs) and 2 can be helpful for such selections). Based on the (not large) sizes that the OP is mentioned in the comment, I will suggest using scipy.spatial.cKDTree, which has the best performance in this regard based on my experiences, to query the points. Using cKDTree.query_pairs we specify groups of points that are in the distance range of the diameter value from each other. Then by looping on the groups and averaging each group's points' coordinates, we can again use cKDTree.query to find the nearest point (k=1) of that group to the specified averaged coordinate. Now, it is easy to calculate distance of other points of that group to the nearest one and move them outward by a specified distance (here just as much as the overlaps):

nears = cKDTree(coords).query_pairs(points_dia, output_type='ndarray')
near_ids = np.unique(nears)
col1_ids = np.unique(nears[:, 0])

for i in range(col1_ids.size):

    pts_ids = np.unique(nears[nears[:, 0] == i].ravel())
    center_coord = np.average(coords[pts_ids], axis=0)
    nearest_center_id = pts_ids[cKDTree(coords[pts_ids]).query(center_coord, k=1)[1]]
    furthest_center_ids = pts_ids[pts_ids != nearest_center_id]

    vecs = coords[furthest_center_ids] - coords[nearest_center_id]
    dists = np.linalg.norm(vecs, axis=1) - points_dia
    coords[furthest_center_ids] = coords[furthest_center_ids] + vecs / np.linalg.norm(vecs, axis=1) * np.abs(dists)

Second step

For this part, we can loop on the modified coordinates (in the previous step) and find the two nearest coordinates (k=2) of the polygon's edges and project the point to that edge to find the closest coordinate on that line. So, again as step 1, we calculate overlaps and move the points to placing them inside the polygon. The points that where single (not in groups) will be moved shorter distances. This distances can be specified as desired and I have set them as a default value:

for i, j in enumerate(coords):
    for k in range(polygon_verts.shape[0]-1):
        nearest_poly_ids = cKDTree(polygon_verts).query(j, k=2)[1]

        vec_line = polygon_verts[nearest_poly_ids][1] - polygon_verts[nearest_poly_ids][0]
        vec_proj = j - polygon_verts[nearest_poly_ids][0]
        line_pnt = polygon_verts[nearest_poly_ids][0] + np.dot(vec_line, vec_proj) / np.dot(vec_line, vec_line) * vec_line

        vec = j - line_pnt
        dist = np.linalg.norm(vec)

        if dist < points_dia / 2:
            if i in near_ids:
                coords[i] = coords[i] + vec / dist * 2 * points_dia
            else:
                coords[i] = coords[i] + vec / dist

These codes, perhaps, could be optimized to get better performances, but it is not of importance for this question. I have checked it on my small example and need to be checked by larger cases. In all conditions, I think this will be one of the best ways to achieve the goal (may some changes based on the need) and can be a template which can be modified to satisfy any other needs or any shortcomings.

Upvotes: 2

raphael

Reputation: 2970

so this is not particularly fast due to the loop checking for "point in polygon", but as long as you don't intend to ave a very fine grid-spacing something like this would do the job:

from scipy.interpolate import NearestNDInterpolator 

# define a regular grid
gx, gy = np.meshgrid(np.linspace(50, 200, 100), 
                     np.linspace(100, 200, 100))

# get a interpolation function
# (just used "xs" as values in here...)
i = NearestNDInterpolator(np.column_stack((xs, ys)), xs)

# crop your grid to the shape
newp = []
for x, y in zip(gx.flat, gy.flat):
    pt = Point(x, y)
    if p.contains(pt):
        newp.append(pt.xy)

newp = np.array(newp).squeeze()

# evaluate values on the grid
vals = i(*newp.T)

ax1.scatter(*newp.T, c=vals, zorder=0)

Upvotes: 1

Resample points in a Geodataframe so they stay equally spaced but inside area

Answers (2)

First step

Second step

Related Questions