Reputation: 13
Assume a list of points or nodes. each one of them has x y and z coordinates . the distance between two points i and j equal D(i,j)= sqrt((xi-xj)^2+(yi-yj)^2+(zi-zj)^2)
. Here I got 400000 data points.
Now, I want to select a set of these nodes which have equal distances between them (the inter-distance is specified previously --> 0.05). Hence the selected points are uniformly distributed.
If run with a while loop, it takes approx 3h to complete the entire data set. Looking for fastest method.
no_rows = len(df)
i = 1
while i < no_rows:
a1 = df.iloc[i-1, 1]
a2 = df.iloc[i, 1]
b1 = df.iloc[i-1, 2]
b2 = df.iloc[i, 2]
c1 = df.iloc[i-1, 3]
c2 = df.iloc[i, 3]
dist = np.round(((a2-a1)**2+(b2-b1)**2+(c2-c1)**2)**0.5,5)
df.iloc[i, 6]= dist
if dist < 0.05000:
df = df.drop(i)
df.reset_index(drop = True, inplace = True)
no_rows = len(df)
i = i-1
i+=1
Upvotes: 1
Views: 578
Reputation: 2514
EDIT
An option would be to use directly pandas and merging the dataframe over itself. Something like :
import pandas as pd
import numpy as np
df = pd.DataFrame([
[131.404866,16.176877,128.120177 ],
[131.355045,16.176441,128.115972 ],
[131.305224,16.176005,128.111767 ],
[131.255403,16.175569,128.107562 ],
[131.205582,16.175133,128.103357 ],
[131.158858,16.174724,128.099413 ],
[131.15576,16.174702,128.09916 ],
[131.105928,16.174342,128.095089 ],
[131.05988,16.174009,128.091328 ],
[131.056094,16.173988,128.09103 ],
[131.006249,16.173712,128.087107 ],
[130.956404,16.173436,128.083184],
],
columns=['x', 'y', 'z']
)
df.reset_index(drop=False, inplace=True)
dist = 0.05
df['CROSS'] = 1
df = df.merge(df, on="CROSS")
df.reset_index(drop=True, inplace=True)
df['distance'] = np.round(
np.sqrt(
np.square(df['x_x'] - df['x_y'])
+ np.square(df['y_x']-df['y_y'])
+ np.square(df['z_x']-df['z_y'])
),
5
)
#drop values where distances are = 0 (same points)
ix = df[df.distance==0].index
df.drop(ix, inplace=True)
print('These are all pair of points which are matching the distance', dist)
ix = df[df.distance.astype(float)==dist].index
df.sort_values('distance', inplace=True)
print(df.loc[ix])
print('-'*50)
points = pd.DataFrame(
df.loc[ix, ['index_x', 'x_x', 'y_x', 'z_x']].values.tolist()
+ df.loc[ix, ['index_y', 'x_y', 'y_y', 'z_y']].values.tolist(),
columns=['index', 'x', 'y', 'z'])
points.drop_duplicates(keep='first', inplace=True)
print('These are all the points which have another at distance', dist)
print(points)
Numpy's function are way faster than any loop and will allow you to treat the whole dataset at the same time.
Another could be to use geopandas (it can be very fast also, but I'm not sure this would be the case here : the fastest method involves pyproj's distance computation (written in C) and I don't think there is any declination in 3D)
Upvotes: 1