user3206440
user3206440

Reputation: 5069

Python pandas - aggregate column on latitude, longitude based distance

With dataframe like below,

Time            Lat         Long        Val
19:24:50.925    35.61068333 139.6304283 -54.6   
19:24:51.022    35.61068333 139.6304283 -52.9   
19:24:51.118    35.61068333 139.6304283 -52.6   
19:24:51.215    35.61068394 139.6304283 -52.2
19:24:51.312    35.61068455 139.6304283 -49.3
19:24:51.409    35.61068515 139.6304283 -52.1
19:24:51.506    35.61068576 139.6304283 -52.2
19:24:51.603    35.61068636 139.6304283 -51.3
19:24:51.699    35.61068697 139.6304283 -51.8
19:24:51.796    35.61068758 139.6304283 -52.6
19:24:51.892    35.61068818 139.6304283 -53.5
19:24:51.990    35.61068879 139.6304283 -51.8
19:24:52.087    35.61068939 139.6304283 -54.1
19:24:52.183    35.61069042 139.6304283 -51.8
19:24:52.281    35.61069083 139.6304283 -53.5
19:24:52.378    35.61069125 139.6304283 -55.6
19:24:52.474    35.61069222 139.6304283 -53.2
19:24:52.571    35.61069278 139.6304283 -50.8
19:24:52.668    35.61069333 139.6304283 -54

The column Lat and Long together holds the geographic coordinates for each location and the Val column holds the measurement of some metric at that location. What I need to do is to aggregate the Val column (mean) every 0.005 meters - meaning starting with the first Location (lat/long) as reference check for rows falling within 0.005 meters of it and get the average of Vals and repeat from the next Location ( which is outside the 0.005m limit) - such that the result would look like below. I looked at pandas.Grouper but not sure about how to use it to achieve the results.

Lat Long Val Count_of_records

Upvotes: 0

Views: 963

Answers (1)

jwalton
jwalton

Reputation: 5686

Apologies, I'm still having a little trouble understanding the question. Hopefully this is it.

Sure this solution is a little verbose, but I think this should make the logic clearer and make it easier to maintain in the future.

import pandas as pd
from io import StringIO
import geopy.distance
import numpy as np

# Setup data as in MWE
df = pd.read_fwf(StringIO("""
    Time            Lat         Long        Val
19:24:50.925    35.61068333 139.6304283 -54.6   
19:24:51.022    35.61068333 139.6304283 -52.9   
19:24:51.118    35.61068333 139.6304283 -52.6   
19:24:51.215    35.61068394 139.6304283 -52.2
19:24:51.312    35.61068455 139.6304283 -49.3
19:24:51.409    35.61068515 139.6304283 -52.1
19:24:51.506    35.61068576 139.6304283 -52.2
19:24:51.603    35.61068636 139.6304283 -51.3
19:24:51.699    35.61068697 139.6304283 -51.8
19:24:51.796    35.61068758 139.6304283 -52.6
19:24:51.892    35.61068818 139.6304283 -53.5
19:24:51.990    35.61068879 139.6304283 -51.8
19:24:52.087    35.61068939 139.6304283 -54.1
19:24:52.183    35.61069042 139.6304283 -51.8
19:24:52.281    35.61069083 139.6304283 -53.5
19:24:52.378    35.61069125 139.6304283 -55.6
19:24:52.474    35.61069222 139.6304283 -53.2
19:24:52.571    35.61069278 139.6304283 -50.8
19:24:52.668    35.61069333 139.6304283 -54"""), header=1)

# Extract longitude and latitude from df
coords = df[['Lat', 'Long']].values
# Compute the distances between consecutive rows of the dataframe
consec_dist = [geopy.distance.geodesic(*i).m for i in zip(coords[:-1], coords[1:])]

# Set up column in which to store our aggregates
df['mean'] = np.zeros(df.shape[0])

# The threshold distance
d = 0.005

# Loop over the rows one at a time
for row in range(df.shape[0] - 1):

    # From comments:
    # if less than limit distance { store Val of current row for aggregation}
    # else { perform aggregation on stored Vals and change reference to current row } repeat

    if consec_dist[row] < d:
        df.loc[row, 'mean'] = df.loc[row, 'Val']
    else:
        df.loc[row, 'mean'] = df.loc[row:row + 1, 'Val'].mean()

This gave me the following:

In [2]: df
Out[2]:
            Time        Lat        Long   Val   mean
0   19:24:50.925  35.610683  139.630428 -54.6 -54.60
1   19:24:51.022  35.610683  139.630428 -52.9 -52.90
2   19:24:51.118  35.610683  139.630428 -52.6 -52.40
3   19:24:51.215  35.610684  139.630428 -52.2 -50.75
4   19:24:51.312  35.610685  139.630428 -49.3 -50.70
5   19:24:51.409  35.610685  139.630428 -52.1 -52.15
6   19:24:51.506  35.610686  139.630428 -52.2 -51.75
7   19:24:51.603  35.610686  139.630428 -51.3 -51.55
8   19:24:51.699  35.610687  139.630428 -51.8 -52.20
9   19:24:51.796  35.610688  139.630428 -52.6 -53.05
10  19:24:51.892  35.610688  139.630428 -53.5 -52.65
11  19:24:51.990  35.610689  139.630428 -51.8 -52.95
12  19:24:52.087  35.610689  139.630428 -54.1 -52.95
13  19:24:52.183  35.610690  139.630428 -51.8 -52.65
14  19:24:52.281  35.610691  139.630428 -53.5 -54.55
15  19:24:52.378  35.610691  139.630428 -55.6 -54.40
16  19:24:52.474  35.610692  139.630428 -53.2 -52.00
17  19:24:52.571  35.610693  139.630428 -50.8 -52.40
18  19:24:52.668  35.610693  139.630428 -54.0   0.00

Upvotes: 1

Related Questions