Reputation: 967

Calculate distance a dataframe with UTM coordinates in pandas

I have a huge dataframe. The structure data looks like this:

df
ID  Annotation  X           Y
A   Boarding    767513.9918 9425956.2571
A   Alighting   767154.1396 9427584.0004
B   Boarding    767450.5277 9432627.9543
B   Alighting   767495.0101 9426797.1772
C   Boarding    767648.9507 9426442.5497
C   Alighting   767037.0309 9428878.9032
........

The X and Y data used UTM coordinates. I want to calculate the distance between each ID boarding and alighting. My question is quietly similar but different to this question (Distance matrix in Python Pandas). My expected result looks like this:

result
ID  Anotation_1 X_1         Y_1         Anotation_2 X_2         Y_2      Dist
A   Boarding    767513.99   9425956.26  Alighting   767154.14   9427584.00  1667.05
B   Boarding    767450.53   9432627.95  Alighting   767495.01   9426797.18  5830.95
C   Boarding    767648.95   9426442.55  Alighting   767037.03   9428878.90  2512.02
    .......

Thank you for your help.

Upvotes: 0

Answers (3)

anky

Reputation: 75120

I am using unstack():

m=(df.assign(k=(df.groupby('ID').cumcount()+1).astype(str)).
        set_index(['ID','k']).unstack().sort_values(by='k',axis=1))
m.columns=m.columns.map('_'.join)

m=m.assign(Dist=np.sqrt((m.X_1 - m.X_2)**2 + (m.Y_1 - m.Y_2)**2))
print(m)

Upvotes: 2

Serge Ballesta

Reputation: 149125

I would pivot the dataframe:

result = df.pivot('ID', 'Annotation', ['X', 'Y'])

to get

                      X                          Y              
Annotation    Alighting     Boarding     Alighting      Boarding
ID                                                              
A           767154.1396  767513.9918  9.427584e+06  9.425956e+06
B           767495.0101  767450.5277  9.426797e+06  9.432628e+06
C           767037.0309  767648.9507  9.428879e+06  9.426443e+06

Then I would rename the columns and reindex:

ix = result.columns.to_frame()
result.columns = ix['Annotation'] + '_' + ix.iloc[:,0]
result = result.reindex(columns=['Alighting_X', 'Alighting_Y', 'Boarding_X', 'Boarding_Y'])

to get:

    Alighting_X   Alighting_Y   Boarding_X    Boarding_Y
ID                                                      
A   767154.1396  9.427584e+06  767513.9918  9.425956e+06
B   767495.0101  9.426797e+06  767450.5277  9.432628e+06
C   767037.0309  9.428879e+06  767648.9507  9.426443e+06

It is now easy to compute the distance:

result['Dist'] = np.sqrt((result.Alighting_X - result.Boarding_X)**2 + (result.Alighting_Y - result.Boarding_Y)**2)

to finally get:

    Alighting_X   Boarding_X   Alighting_Y    Boarding_Y         Dist
ID                                                                   
A   767154.1396  767513.9918  9.427584e+06  9.425956e+06  1667.045847
B   767495.0101  767450.5277  9.426797e+06  9.432628e+06  5830.946773
C   767037.0309  767648.9507  9.428879e+06  9.426443e+06  2512.023929

Upvotes: 2

Yaniv

Reputation: 839

One way to approach this, assuming the input is clean and correct, would be using groupby:

df = df.groupby('ID').apply(lambda x: pd.Series(x.values[0:2,2:4].flatten()))  # (*)
df.columns=['X_1','Y_1','X_2','Y_2']
#df.reset_index()  # Uncomment if you want 'ID' as a column and not an Index

As for the other columns in your desired result: The Anotation_1 and Anotation_2 are always constant, so I didn't bother to include them. The Dist column -- well, you can calculate it now, given the new columns, or you could change the code above to calculate the distance already while traversing the numbers in step (*) above, thereby changing our code to something like: (here used dummy distance calculation, replace it with yours!)

def my_func(pdf):
    return pd.Series([pdf.values[0,2], pdf.values[0,3], pdf.values[1,2], pdf.values[1,3],
                      np.sqrt((pdf.values[0,2]-pdf.values[1,2])**2+(pdf.values[0,3]-pdf.values[1,3])**2)  # <= your distance calculation goes here...
                     ])
df = df.groupby('ID').apply(my_func)
df.columns=['X_1','Y_1','X_2','Y_2','Dist']
#df.reset_index()  # Uncomment if you want 'ID' as a column and not an Index

Update: If you insist on including those constant columns, you can simply add them later like so: (but why would you? especially if it's a large DataFrame...)

df['Annotation_1'] = 'Boarding'
df['Annotation_2'] = 'Alighting'
# And if you further insist on a specific ordering of the columns, you can go with:
df = df[['Annotation_1', 'X_1', 'Y_1', 'Annotation_2', 'X_2', 'Y_2', 'Dist']]

Upvotes: 1

Calculate distance a dataframe with UTM coordinates in pandas

Answers (3)

Related Questions