Calculation of variance of Geo coordinates

Question

How to calculate the `variance` of location details

Location has latitude and longitude. I am looking for a single value that will capture the variance of the location details (not separate variance for latitude and longitude). What is the best way to achieve that?

>>> pdf = pd.DataFrame({'latitude': {0: 47.0, 8: 54.0, 14: 55.0, 15: 39.0, 2: 31.0},
              'longitude': {0: 29.0, 8: 10.0, 14: 36.0, 15: -9.0, 2: 121.0}
             })

>>> pdf

  latitude  longitude

0   47.0    29.0
8   54.0    10.0
14  55.0    36.0
15  39.0    -9.0
2   31.0    121.0

As per numpy documentation, np.var either flattens and then calculates the variance or gives per column wise.

axis None or int or tuple of ints, optional Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.

Expected (just an example)

>>> variance(pdf)
27.9

I would like to understand if the coordinates are close to each other. What is the best possible approach to get a "combined variance"?

blackraven · Accepted Answer

If I understood you correctly, you're looking for a score to describe how close the a group of coordinates are. So if this score is higher, the coordinates are spread further apart.

You could create a new feature by multiplying long*lat, then use the variance of this new feature as the score to compare different groups of coordinates. Let me illustrate with an example:

import matplotlib as plt
import pandas as pd

#these points are closer together
df1 = pd.DataFrame({'latitude': {0: 47.0, 8: 54.0, 14: 55.0, 15: 39.0, 2: 31.0},
                   'longitude': {0: 54.0, 8: 55.0, 14: 39.0, 15: 31.0, 2: 47.0} })
df1['new'] = (df1['latitude']-df1['latitude'].mean()).mul(df1['longitude']-df1['longitude'].mean()).div(100)
score = df1['new'].var()
df1.plot(kind='scatter', x='longitude', y='latitude')

Output score 0.4407372

#these points are having the same spread, but at different location
df2 = pd.DataFrame({'latitude': {0: 147.0, 8: 154.0, 14: 155.0, 15: 139.0, 2: 131.0},
                   'longitude': {0: 154.0, 8: 155.0, 14: 139.0, 15: 131.0, 2: 147.0} })
df2['new'] = (df2['latitude']-df2['latitude'].mean()).mul(df2['longitude']-df2['longitude'].mean()).div(100)
score = df2['new'].var()
df2.plot(kind='scatter', x='longitude', y='latitude')

Output score 0.4407372

#these points are further apart
df3 = pd.DataFrame({'latitude': {0: 14.0, 8: 15.0, 14: 155.0, 15: 13.0, 2: 131.0},
                   'longitude': {0: 15.0, 8: 215.0, 14: 39.0, 15: 131.0, 2: 147.0} })
df3['new'] = (df3['latitude']-df3['latitude'].mean()).mul(df3['longitude']-df3['longitude'].mean()).div(100)
score = df3['new'].var()
df3.plot(kind='scatter', x='longitude', y='latitude')

Output score 2332.5498432

Calculation of variance of Geo coordinates

How to calculate the `variance` of location details

Answers (2)

Related Questions

Calculation of variance of Geo coordinates

How to calculate the variance of location details

Answers (2)

Related Questions

How to calculate the `variance` of location details