Conditional Nearest Neighbor in Python

Question

I’m trying to do some nearest neighbour type analysis in Python using Pandas/Numpy/Scipy etc. and having tried a few different approaches, I’m stumped.

I have is 2 dataframes as follows:

df1

Lon1    Lat1    Type
10      10      A
50      50      A
20      20      B

df2

Lon2    Lat2    Type    Data-1  Data-2  
11      11      A       Eggs    Bacon       
51      51      A       Nuts    Bread   
61      61      A       Beef    Lamb    
21      21      B       Chips   Chicken
31      31      B       Sauce   Pasta
71      71      B       Rice    Oats
81      81      B       Beans   Peas

I’m trying to identify the 2 nearest neighbours in df2 (based upon the Lon / Lat values using Euclidean distance) and then merge the appropriate Data-1 and Data-2 values onto df1 so it looks like this:

Lon1    Lat1    Type    Data-1a     Data-2a     Data-1b     Data-2b
10      10      A       Eggs        Bacon       Nuts        Bread
50      50      A       Nuts        Bread       Beef        Lamb
20      20      B       Chips       Chicken     Sauce       Pasta

I’ve tried both long and wide form approaches and am leaning toward using ckd tree from scipy, however is there a way to do this so it only looks at rows with the appropriate Type?

Thanks in advance.

** Edit **

I've made some progress as follows:

Typelist = df2['Type'].unique().tolist()
df_dict = {'{}'.format(x): df2[(df2['Type'] == x)] for x in Rlist}

def treefunc(row):
    if row['Type'] == 'A':     
        type = row['Type']
        location = row[['Lon1','Lat1']].values
        tree = cKDTree(df_dict[type][['Lon2','Lat2']].values)
        dists, indexes = tree.query(location, k=2)
        return dists,indexes

dftest = df1.apply(treefunc,axis=1)

This gives me a list of the distances and indexes of the 2 nearest neighbours which is great! However I still have some issues:

I tried to test the row['Type'] column for membership of the Typelist using .isin but this didn't work - are there any other ways to do this?
How can I get Pandas to create new columns for the dists and indexes produced by the kdtree?
Also how can I return Data-1 and Data-2 using the indexes?

Thanks in advance.

Conditional Nearest Neighbor in Python

Answers (1)

Related Questions