user9410826
user9410826

Reputation:

How to measure distance between XY points using pandas

I have code that measures the distance between XY coordinates but I'm hoping to make this more efficient through the use of pandas.

Let's say I have the XY coordinates of some subjects:

id_X = [1,2,7,19] #Subject 1
id_Y = [2,5,5,7] #Subject 1
cd_X = [3,3,8,20] #Subject 2
cd_Y = [2,5,6,7] #Subject 2

And I want to measure the distance of these subjects against another important XY coordinate:

Factor_X = [10,20,30,20] #Important XY
Factor_Y = [2,5,6,7] #Important XY

To get the distance of the first subject I use the following and iterate through each row.

dist = math.sqrt(((id_X[0] - Factor_X[0])**2)+((id_Y[0] - Factor_Y[0])**2))

The get the distance of the second subject I would swap id_X,id_Y for cd_X ,cd_Y.

This becomes very inefficient if I have numerous subjects. Therefore, I'm trying to implement the same concept but through pandas.

The following is my attempt:

d = ({                
    'id_X' : [1,2,7,19], 
    'id_Y' : [2,5,5,7], 
    'cd_X' : [3,3,8,20], 
    'cd_Y' : [2,5,6,7],
    'Factor_X' : [10,20,30,20], 
    'Factor_Y' : [2,5,6,7],          
     })

df = pd.DataFrame(data= d)

df['distance'] = math.sqrt(((df['id_X']-df['Factor_X'])**2)+((df['id_Y']-df['Factor_Y'])**2))
df['distance'] = math.sqrt(((df['cd_X']-df['Factor_X'])**2)+((df['cd_Y']-df['Factor_Y'])**2))

But this returns an error:

TypeError: cannot convert the series to <class 'float'>

Intended Output:

   id_X  id_Y  cd_X cd_Y  Factor_X  Factor_Y  id_distance  cd_distance
0  1     2     3    2     10        2         9            7
1  2     5     3    5     20        5         18           17
2  7     5     8    6     30        6         23           22
3  19    7     20   7     20        7         1            0

Is this method feasible and will it create a more time effective approach?

Upvotes: 3

Views: 504

Answers (1)

cs95
cs95

Reputation: 402813

Filter out id and cd and proceed as usual.

ids = df.filter(like='id')
cds = df.filter(like='cd')  
factor = df.filter(like='Factor')

df['id_distance'] = ((ids.values - factor.values) ** 2).sum(1) ** .5
df['cs_distance'] = ((cds.values - factor.values) ** 2).sum(1) ** .5

df 
   id_X  id_Y  cd_X  cd_Y  Factor_X  Factor_Y  id_distance  cs_distance
0     1     2     3     2        10         2     9.000000          7.0
1     2     5     3     5        20         5    18.000000         17.0
2     7     5     8     6        30         6    23.021729         22.0
3    19     7    20     7        20         7     1.000000          0.0

Upvotes: 1

Related Questions