Reputation: 723
I need to create a column that will be made up of a list of lists that represents the distance between points. I am trying to create this list of distances in one list comprehension or the most efficient way possible.
Here is the beginning data frame df
ID list_1 list_2
00 [(10,2),(5,7)] [(11,3),(9,9)]
01 [(1,7)] [(9,1)(2,1),(6,3)]
02 [(4,2),(9,4)] [(3,7)]
Here is the ending df
data frame that I want. Essentially for each row, every tuple in column list_2
will need to find the distance between itself and every tuple in column list_1
.
ID list_1 list_2 distances
00 [(10,2),(5,7)] [(11,3),(9,9)] [[1.41,7.21],[7.07,4.47]]
01 [(1,7)] [(9,1)(2,1)] [[10.0,6.08]]
I end up doing six list comprehensions before I get to the end goal but I am sure there is a more efficient way.
what I am doing:
import pandas as pd
import math
step 1
df['x'] = [[s[1] for s in object_slice] for object_slice in df['list_1']]
step 2
df['y'] = [[s[1] for s in object_slice] for object_slice in df['list_1']]
step 3
df['dist_p1'] = [[(df['x'][a] - s[1],df['y'][a] - s[0]) for s in object_slice]for a, object_slice in enumerate(df['list_2'])]
step 4
df['dist_p2'] = [[s[0] for s in object_slice] for object_slice in df['dist_p1']]
step 5
df['dist_p3'] = [[s[1] for s in object_slice] for object_slice in df['dist_p1']]
step 6
df['distances'] = [[[round(math.hypot(s2,df['dist_p2'][a][b][c]),2) for c, s2 in enumerate(s)] for b,s in enumerate(object_slice)] for a, object_slice in enumerate(df['dist_p1'])]
Upvotes: 0
Views: 313
Reputation: 1397
OP:
Your original code throw error at step3, So I cannot reproduce your result.
However, your calculation logic seem to be inconsistent between row 00
and row 01
in your example result.
Because:
In row 00
,
[[1.41,7.21],[7.07,4.47]]=[[distance((11,3),(10,2)),distance((11,3)(5,7))],
[distance((9,9),(10,2)),distance((9,9),(5,7))]]
Here list_2
is the outer loop, list_1
is the inner loop.
However in row 01
,
[[10.0,6.08]] = [[distance((1,7),(9,1)), distance((1,7),(2,1))]]
Here list_1
is the outer loop, list_2
is the inner loop.
In other words, the order of the nested loop logic is different between row 00
and row 01
in your example result.
However, here is what I will do if I use list_1
as outer loop.
df['distances']=df.apply(lambda row:[[round(math.hypot(i[0]-j[0],i[1]-j[1]),2) for j in row['list_2']] for i in row['list_1']],axis=1)
Returns:
list_1 list_2 distances
0 [(10, 2), (5, 7)] [(11, 3), (9, 9)] [[1.41, 7.07], [7.21, 4.47]]
1 [(1, 7)] [(9, 1), (2, 1)] [[10.0, 6.08]]
If you need to use list_2
as outer loop, you can simply swap list_1
and list_2
in the lambda
function.
Upvotes: 1