Dre
Dre

Reputation: 723

Find the distance between a list of points in two columns with one list comprehension in Python

I need to create a column that will be made up of a list of lists that represents the distance between points. I am trying to create this list of distances in one list comprehension or the most efficient way possible.

Here is the beginning data frame df

ID        list_1              list_2
00    [(10,2),(5,7)]      [(11,3),(9,9)]
01    [(1,7)]             [(9,1)(2,1),(6,3)]
02    [(4,2),(9,4)]       [(3,7)] 

Here is the ending df data frame that I want. Essentially for each row, every tuple in column list_2 will need to find the distance between itself and every tuple in column list_1.

ID        list_1              list_2               distances
00    [(10,2),(5,7)]      [(11,3),(9,9)]    [[1.41,7.21],[7.07,4.47]]
01    [(1,7)]             [(9,1)(2,1)]      [[10.0,6.08]] 

I end up doing six list comprehensions before I get to the end goal but I am sure there is a more efficient way.

what I am doing:

import pandas as pd
import math

step 1

df['x'] = [[s[1] for s in object_slice] for object_slice in df['list_1']]

step 2

df['y'] = [[s[1] for s in object_slice] for object_slice in df['list_1']]

step 3

df['dist_p1'] = [[(df['x'][a] - s[1],df['y'][a] - s[0]) for s in object_slice]for a, object_slice in enumerate(df['list_2'])]

step 4

df['dist_p2'] = [[s[0] for s in object_slice] for object_slice in df['dist_p1']]

step 5

df['dist_p3'] = [[s[1] for s in object_slice] for object_slice in df['dist_p1']]

step 6

df['distances'] = [[[round(math.hypot(s2,df['dist_p2'][a][b][c]),2) for c, s2 in enumerate(s)] for b,s in enumerate(object_slice)] for a, object_slice in enumerate(df['dist_p1'])]

Upvotes: 0

Views: 313

Answers (1)

fusion
fusion

Reputation: 1397

OP:

Your original code throw error at step3, So I cannot reproduce your result.

However, your calculation logic seem to be inconsistent between row 00 and row 01 in your example result.

Because: In row 00,

[[1.41,7.21],[7.07,4.47]]=[[distance((11,3),(10,2)),distance((11,3)(5,7))],
                           [distance((9,9),(10,2)),distance((9,9),(5,7))]]

Here list_2 is the outer loop, list_1 is the inner loop.

However in row 01,

[[10.0,6.08]] = [[distance((1,7),(9,1)), distance((1,7),(2,1))]]

Here list_1 is the outer loop, list_2 is the inner loop.

In other words, the order of the nested loop logic is different between row 00 and row 01 in your example result.


However, here is what I will do if I use list_1 as outer loop.

df['distances']=df.apply(lambda row:[[round(math.hypot(i[0]-j[0],i[1]-j[1]),2) for j in row['list_2']] for i in row['list_1']],axis=1)

Returns:

    list_1              list_2              distances
0   [(10, 2), (5, 7)]   [(11, 3), (9, 9)]   [[1.41, 7.07], [7.21, 4.47]]
1   [(1, 7)]            [(9, 1), (2, 1)]    [[10.0, 6.08]]

If you need to use list_2 as outer loop, you can simply swap list_1 and list_2 in the lambda function.

Upvotes: 1

Related Questions