tbone
tbone

Reputation: 7

Add column to dataframe based on intervals from another dataframe

I have a gpx file that I am manipulating. I would like to add a column to it that describes the terrain based on another dataframe that lists the terrain by distance. Here are the dataframes:

GPS_df

    lat lon alt time    dist    total_dist
0   44.565335   -123.312517 85.314  2020-09-07 14:00:01 0.000000    0.000000
1   44.565336   -123.312528 85.311  2020-09-07 14:00:02 0.000547    0.000547
2   44.565335   -123.312551 85.302  2020-09-07 14:00:03 0.001137    0.001685
3   44.565332   -123.312591 85.287  2020-09-07 14:00:04 0.001985    0.003670
4   44.565331   -123.312637 85.270  2020-09-07 14:00:05 0.002272    0.005942
... ... ... ... ... ... ...
12481   44.565576   -123.316116 85.517  2020-09-07 17:28:14 0.002318    26.091324
12482   44.565559   -123.316072 85.587  2020-09-07 17:28:15 0.002469    26.093793
12483   44.565554   -123.316003 85.637  2020-09-07 17:28:16 0.003423    26.097217
12484   44.565535   -123.315966 85.697  2020-09-07 17:28:17 0.002249    26.099465
12485   44.565521   -123.315929 85.700  2020-09-07 17:28:18 0.002066    26.101532

terrain_df:

dist    terrain
0   0.0 Start
1   3.0 Road
2   5.0 Gravel
3   8.0 Trail-hard
4   12.0    Gravel
5   16.0    Trail-med
6   18.0    Road
7   22.0    Gravel
8   23.0    Trail-easy
9   26.2    Road

I have come up with the following code, that works, but I would like to make it more efficient by eliminating the looping:

GPS_df['terrain']=""
i=0
for j in range(0,len(GPS_df)):
    if GPS_df.total_dist[j]<= terrain_df.dist[i]:
        GPS_df.terrain[j]=terrain_df.terrain[i]
    else:
        i=i+1
        GPS_df.terrain[j]=terrain_df.terrain[i]

I have tried a half a dozen different ways, but none seem to work correctly. I am sure there is an easy way to do it, but I just don't have the skills and experience to figure it out so far, so I am looking for some help. I tried using cut and add the labels, but cut requires unique labels. I could use cut and then replace the generated intervals with labels in another way, but that doesn't seems like the best approach either. I also tried this approach that I found from another question, but it filled the column with the first label only (I also am having trouble understanding how it works, so it makes it tough to troubleshoot).

bins = terrain_df['dist']
names = terrain_df['terrain']

d = dict(enumerate(names, 1))

GPS_df['terrain2'] = np.vectorize(d.get)(np.digitize(GPS_df['dist'], bins))

Appreciate any guidance that you can give me.

Upvotes: 0

Views: 40

Answers (1)

divingTobi
divingTobi

Reputation: 2300

I believe pandas.merge_asof should do the trick. Try:

result = pd.merge_asof(left=GPS_df, right=terrain_df, left_on='total_dist', right_on='dist', direction='backward')

Upvotes: 1

Related Questions