Reputation: 5069
With dataframe like below,
Time Lat Long Val
19:24:50.925 35.61068333 139.6304283 -54.6
19:24:51.022 35.61068333 139.6304283 -52.9
19:24:51.118 35.61068333 139.6304283 -52.6
19:24:51.215 35.61068394 139.6304283 -52.2
19:24:51.312 35.61068455 139.6304283 -49.3
19:24:51.409 35.61068515 139.6304283 -52.1
19:24:51.506 35.61068576 139.6304283 -52.2
19:24:51.603 35.61068636 139.6304283 -51.3
19:24:51.699 35.61068697 139.6304283 -51.8
19:24:51.796 35.61068758 139.6304283 -52.6
19:24:51.892 35.61068818 139.6304283 -53.5
19:24:51.990 35.61068879 139.6304283 -51.8
19:24:52.087 35.61068939 139.6304283 -54.1
19:24:52.183 35.61069042 139.6304283 -51.8
19:24:52.281 35.61069083 139.6304283 -53.5
19:24:52.378 35.61069125 139.6304283 -55.6
19:24:52.474 35.61069222 139.6304283 -53.2
19:24:52.571 35.61069278 139.6304283 -50.8
19:24:52.668 35.61069333 139.6304283 -54
The column Lat
and Long
together holds the geographic coordinates for each location and the Val
column holds the measurement of some metric at that location. What I need to do is to aggregate the Val
column (mean
) every 0.005 meters - meaning starting with the first Location (lat/long) as reference check for rows falling within 0.005 meters of it and get the average of Val
s and repeat from the next Location ( which is outside the 0.005m limit) - such that the result would look like below. I looked at pandas.Grouper but not sure about how to use it to achieve the results.
Lat Long Val Count_of_records
Upvotes: 0
Views: 963
Reputation: 5686
Apologies, I'm still having a little trouble understanding the question. Hopefully this is it.
Sure this solution is a little verbose, but I think this should make the logic clearer and make it easier to maintain in the future.
import pandas as pd
from io import StringIO
import geopy.distance
import numpy as np
# Setup data as in MWE
df = pd.read_fwf(StringIO("""
Time Lat Long Val
19:24:50.925 35.61068333 139.6304283 -54.6
19:24:51.022 35.61068333 139.6304283 -52.9
19:24:51.118 35.61068333 139.6304283 -52.6
19:24:51.215 35.61068394 139.6304283 -52.2
19:24:51.312 35.61068455 139.6304283 -49.3
19:24:51.409 35.61068515 139.6304283 -52.1
19:24:51.506 35.61068576 139.6304283 -52.2
19:24:51.603 35.61068636 139.6304283 -51.3
19:24:51.699 35.61068697 139.6304283 -51.8
19:24:51.796 35.61068758 139.6304283 -52.6
19:24:51.892 35.61068818 139.6304283 -53.5
19:24:51.990 35.61068879 139.6304283 -51.8
19:24:52.087 35.61068939 139.6304283 -54.1
19:24:52.183 35.61069042 139.6304283 -51.8
19:24:52.281 35.61069083 139.6304283 -53.5
19:24:52.378 35.61069125 139.6304283 -55.6
19:24:52.474 35.61069222 139.6304283 -53.2
19:24:52.571 35.61069278 139.6304283 -50.8
19:24:52.668 35.61069333 139.6304283 -54"""), header=1)
# Extract longitude and latitude from df
coords = df[['Lat', 'Long']].values
# Compute the distances between consecutive rows of the dataframe
consec_dist = [geopy.distance.geodesic(*i).m for i in zip(coords[:-1], coords[1:])]
# Set up column in which to store our aggregates
df['mean'] = np.zeros(df.shape[0])
# The threshold distance
d = 0.005
# Loop over the rows one at a time
for row in range(df.shape[0] - 1):
# From comments:
# if less than limit distance { store Val of current row for aggregation}
# else { perform aggregation on stored Vals and change reference to current row } repeat
if consec_dist[row] < d:
df.loc[row, 'mean'] = df.loc[row, 'Val']
else:
df.loc[row, 'mean'] = df.loc[row:row + 1, 'Val'].mean()
This gave me the following:
In [2]: df
Out[2]:
Time Lat Long Val mean
0 19:24:50.925 35.610683 139.630428 -54.6 -54.60
1 19:24:51.022 35.610683 139.630428 -52.9 -52.90
2 19:24:51.118 35.610683 139.630428 -52.6 -52.40
3 19:24:51.215 35.610684 139.630428 -52.2 -50.75
4 19:24:51.312 35.610685 139.630428 -49.3 -50.70
5 19:24:51.409 35.610685 139.630428 -52.1 -52.15
6 19:24:51.506 35.610686 139.630428 -52.2 -51.75
7 19:24:51.603 35.610686 139.630428 -51.3 -51.55
8 19:24:51.699 35.610687 139.630428 -51.8 -52.20
9 19:24:51.796 35.610688 139.630428 -52.6 -53.05
10 19:24:51.892 35.610688 139.630428 -53.5 -52.65
11 19:24:51.990 35.610689 139.630428 -51.8 -52.95
12 19:24:52.087 35.610689 139.630428 -54.1 -52.95
13 19:24:52.183 35.610690 139.630428 -51.8 -52.65
14 19:24:52.281 35.610691 139.630428 -53.5 -54.55
15 19:24:52.378 35.610691 139.630428 -55.6 -54.40
16 19:24:52.474 35.610692 139.630428 -53.2 -52.00
17 19:24:52.571 35.610693 139.630428 -50.8 -52.40
18 19:24:52.668 35.610693 139.630428 -54.0 0.00
Upvotes: 1