Reputation: 7
I have a gpx file that I am manipulating. I would like to add a column to it that describes the terrain based on another dataframe that lists the terrain by distance. Here are the dataframes:
GPS_df
lat lon alt time dist total_dist
0 44.565335 -123.312517 85.314 2020-09-07 14:00:01 0.000000 0.000000
1 44.565336 -123.312528 85.311 2020-09-07 14:00:02 0.000547 0.000547
2 44.565335 -123.312551 85.302 2020-09-07 14:00:03 0.001137 0.001685
3 44.565332 -123.312591 85.287 2020-09-07 14:00:04 0.001985 0.003670
4 44.565331 -123.312637 85.270 2020-09-07 14:00:05 0.002272 0.005942
... ... ... ... ... ... ...
12481 44.565576 -123.316116 85.517 2020-09-07 17:28:14 0.002318 26.091324
12482 44.565559 -123.316072 85.587 2020-09-07 17:28:15 0.002469 26.093793
12483 44.565554 -123.316003 85.637 2020-09-07 17:28:16 0.003423 26.097217
12484 44.565535 -123.315966 85.697 2020-09-07 17:28:17 0.002249 26.099465
12485 44.565521 -123.315929 85.700 2020-09-07 17:28:18 0.002066 26.101532
terrain_df:
dist terrain
0 0.0 Start
1 3.0 Road
2 5.0 Gravel
3 8.0 Trail-hard
4 12.0 Gravel
5 16.0 Trail-med
6 18.0 Road
7 22.0 Gravel
8 23.0 Trail-easy
9 26.2 Road
I have come up with the following code, that works, but I would like to make it more efficient by eliminating the looping:
GPS_df['terrain']=""
i=0
for j in range(0,len(GPS_df)):
if GPS_df.total_dist[j]<= terrain_df.dist[i]:
GPS_df.terrain[j]=terrain_df.terrain[i]
else:
i=i+1
GPS_df.terrain[j]=terrain_df.terrain[i]
I have tried a half a dozen different ways, but none seem to work correctly. I am sure there is an easy way to do it, but I just don't have the skills and experience to figure it out so far, so I am looking for some help. I tried using cut and add the labels, but cut requires unique labels. I could use cut and then replace the generated intervals with labels in another way, but that doesn't seems like the best approach either. I also tried this approach that I found from another question, but it filled the column with the first label only (I also am having trouble understanding how it works, so it makes it tough to troubleshoot).
bins = terrain_df['dist']
names = terrain_df['terrain']
d = dict(enumerate(names, 1))
GPS_df['terrain2'] = np.vectorize(d.get)(np.digitize(GPS_df['dist'], bins))
Appreciate any guidance that you can give me.
Upvotes: 0
Views: 40
Reputation: 2300
I believe pandas.merge_asof
should do the trick. Try:
result = pd.merge_asof(left=GPS_df, right=terrain_df, left_on='total_dist', right_on='dist', direction='backward')
Upvotes: 1