compare each data points between every data point between two dataframes without looping

Question

I'd like to check coordinates (x,y,z) from dataframe-1 (df1) to see if the location is close enough to an irregular surface that has its own coordinates (x,y,z) stored in dataframe-2 (df2).

I'm able to go through each coordinate in df1, then loop through all coordinates in df2 and check it's distance. Then repeat for all coordinates in df1, but this would take sooooo long when I have over 1,000,000 coordinates in df1 to check.

I'm using pandas and wondering if it can be done without looping.

If coordinate in df1 is close to df2 then I want to select it and store it into df3.

bubble · Accepted Answer

Scipy could help you. Look at the following hypothetical example:

import pandas as pd 
from scipy.spatial import cKDTree

dataset1 = pd.DataFrame(pd.np.random.rand(100,3))
dataset2 = pd.DataFrame(pd.np.random.rand(10, 3))

ck = cKDTree(dataset1.values)

ck.query_ball_point(dataset2.values, r=0.1)

array([list([]), list([]), list([]), list([]), list([28, 83]), list([79]), list([]), list([86]), list([40]), list([29, 60, 95])], dtype=object)

compare each data points between every data point between two dataframes without looping

Answers (2)

Related Questions