Reputation: 7526
I am having difficulty mastering pandas special merge functions like merge_asof()
.
I have two dataframes: coords
- pings from an EV gps, and info
- other EV attributes such as navigation destination and battery level. My objective is to merge them such that the output dataframe row number equals the sum of both dataframes' number of rows. For example:
coords.shape
(10, 3)
coords
ts lat lng
2021-01-02 16:08:24.067971 58.3019 -134.4197
2021-01-06 12:54:18.535681 58.3021 -134.4195
2021-01-08 22:15:35.036423 58.3025 -134.4195
2021-01-16 01:10:39.610540 58.3029 -134.4193
2021-01-27 12:28:45.202376 58.3030 -134.4197
2021-01-30 05:32:09.404525 58.3031 -134.4190
2021-02-08 10:39:19.686159 58.3033 -134.4187
2021-02-15 01:30:16.733921 58.3039 -134.4187
2021-02-16 12:49:55.366025 58.3040 -134.4185
2021-02-19 23:57:57.369978 58.3041 -134.4181
info.shape
(3, 3)
info
ts nav_to battery
2021-01-26 12:47:52.972586 Juneau 90
2021-02-14 23:23:18.186058 Anchorage 50
2021-02-19 07:26:35.357977 Fairbanks 30
info
and coord
should be merged so that the timestamps ts
are in consecutive order, and so that info
rows should be matched to rows in coords
with the nearest timestamp that comes "before". Lastly, nav_to
, battery
, lat
and lng
should should be filled forward. The output from the examples above would be:
output
ts lat lng nav_to battery
2021-01-02 16:08:24.067971 58.3019 -134.4197 None NaN
2021-01-06 12:54:18.535681 58.3021 -134.4195 None NaN
2021-01-08 22:15:35.036423 58.3025 -134.4195 None NaN
2021-01-16 01:10:39.610540 58.3029 -134.4193 None NaN
2021-01-26 12:47:52.972586 58.3029 -134.4193 Juneau 90.0
2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
2021-02-14 23:23:18.186058 58.3033 -134.4187 Anchorage 50.0
2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2021-02-19 07:26:35.357977 58.3040 -134.4185 Fairbanks 30.0
2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
I have tried with pd.merge_asof(coords, info, on="ts", direction="forward")
but this does not produce the correct result, it fills backwards and only keeps records from coords
. What are the correct commands to produce the desired result in pandas
?
Upvotes: 1
Views: 244
Reputation: 150735
Try with the default direction='backward'
, then concat
with the second dataframe
(pd.concat([pd.merge_asof(df1, df2, on='ts'), df2])
.sort_values('ts')
)
Output:
ts lat lng nav_to battery
0 2021-01-02 16:08:24.067971 58.3019 -134.4197 NaN NaN
1 2021-01-06 12:54:18.535681 58.3021 -134.4195 NaN NaN
2 2021-01-08 22:15:35.036423 58.3025 -134.4195 NaN NaN
3 2021-01-16 01:10:39.610540 58.3029 -134.4193 NaN NaN
0 2021-01-26 12:47:52.972586 NaN NaN Juneau 90.0
4 2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
5 2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
6 2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
1 2021-02-14 23:23:18.186058 NaN NaN Anchorage 50.0
7 2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
8 2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2 2021-02-19 07:26:35.357977 NaN NaN Fairbanks 30.0
9 2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
You can then choose to bfill
the lat
and lng
columns. Or you can just merge_asof
twice:
(pd.concat([pd.merge_asof(df1, df2, on='ts'),
pd.merge_asof(df2, df1, on='ts')
])
.sort_values('ts')
)
Output:
ts lat lng nav_to battery
0 2021-01-02 16:08:24.067971 58.3019 -134.4197 NaN NaN
1 2021-01-06 12:54:18.535681 58.3021 -134.4195 NaN NaN
2 2021-01-08 22:15:35.036423 58.3025 -134.4195 NaN NaN
3 2021-01-16 01:10:39.610540 58.3029 -134.4193 NaN NaN
0 2021-01-26 12:47:52.972586 58.3029 -134.4193 Juneau 90.0
4 2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
5 2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
6 2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
1 2021-02-14 23:23:18.186058 58.3033 -134.4187 Anchorage 50.0
7 2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
8 2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2 2021-02-19 07:26:35.357977 58.3040 -134.4185 Fairbanks 30.0
9 2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
Upvotes: 2