Reputation: 437
I have a data frame that looks something like this
+-----+------------+-------------+-------------------------+----+----------+----------+
| | Actual_Lat | Actual_Long | Time | ID | Cal_long | Cal_lat |
+-----+------------+-------------+-------------------------+----+----------+----------+
| 0 | 63.433376 | 10.397068 | 2019-09-30 04:48:13.540 | 11 | 10.39729 | 63.43338 |
| 1 | 63.433301 | 10.395846 | 2019-09-30 04:48:18.470 | 11 | 10.39731 | 63.43326 |
| 2 | 63.433259 | 10.394543 | 2019-09-30 04:48:23.450 | 11 | 10.39576 | 63.43323 |
| 3 | 63.433258 | 10.394244 | 2019-09-30 04:48:29.500 | 11 | 10.39555 | 63.43436 |
| 4 | 63.433258 | 10.394215 | 2019-09-30 04:48:35.683 | 11 | 10.39505 | 63.43427 |
| ... | ... | ... | ... | ...| ... | ... |
| 70 | NaN | NaN | NaT | NaN| 10.35826 | 63.43149 |
| 71 | NaN | NaN | NaT | NaN| 10.35809 | 63.43155 |
| 72 | NaN | NaN | NaT | NaN| 10.35772 | 63.43163 |
| 73 | NaN | NaN | NaT | NaN| 10.35646 | 63.43182 |
| 74 | NaN | NaN | NaT | NaN| 10.35536 | 63.43196 |
+-----+------------+-------------+-------------------------+----------+----------+----------+
Actual_lat
and Actual_long
contains GPS coordinates of data obtained from GPS device. Cal_lat
and cal_lat
are GPS coordinates obtained from OSRM's API
. As you can see there is a lot of data missing in actual coordinates. I am looking to get a data set such that when I take difference of actual_lat vs cal_lat it should be zero or at least close to zero. I tried to fill these missing values with destination lat and long, but that would result in huge difference. My question is how can I fill these values using python/pandas so that when vehicle followed the OSRM estimated path the difference between actual lat/long and estimated lat/long should be zero or close to zero. I am new to GIS data Sets and have no idea about how to deal with them.
EDIT: I am looking for something like this.
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
| | Actual_Lat | Actual_Long | Time | Tour ID | Cal_long | Cal_lat | coordinates_diff_Lat | coordinates_diff_Lon |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
| 0 | 63.433376 | 10.397068 | 2019-09-30 04:48:13.540 | 11 | 10.39729 | 63.43338 | -0.000 | -0.000 |
| 1 | 63.433301 | 10.395846 | 2019-09-30 04:48:18.470 | 11 | 10.39731 | 63.43326 | 0.000 | -0.001 |
| 2 | 63.433259 | 10.394543 | 2019-09-30 04:48:23.450 | 11 | 10.39576 | 63.43323 | 0.000 | -0.001 |
| 3 | 63.433258 | 10.394244 | 2019-09-30 04:48:29.500 | 11 | 10.39555 | 63.43436 | -0.001 | -0.001 |
| 4 | 63.433258 | 10.394215 | 2019-09-30 04:48:35.683 | 11 | 10.39505 | 63.43427 | -0.001 | -0.001 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70 | 63.43000 | 10.35800 | NaT | 115268.0 | 10.35826 | 63.43149 | 0.000 | -0.003 |
| 71 | 63.43025 | 10.35888 | NaT | 115268.0 | 10.35809 | 63.43155 | 0.000 | -0.003 |
| 72 | 63.43052 | 10.35713 | NaT | 115268.0 | 10.35772 | 63.43163 | 0.000 | -0.002 |
| 73 | 63.43159 | 10.35633 | NaT | 115268.0 | 10.35646 | 63.43182 | 0.000 | -0.001 |
| 74 | 63.43197 | 10.35537 | NaT | 115268.0 | 10.35536 | 63.43196 | 0.000 | 0.000 |
+-----+------------+-------------+-------------------------+----------+----------+----------+----------------------+----------------------+
Note that 63.43197,10.35537
is destination and 63.433376,10.397068
is starting position. All these points represent road coordinates.
Upvotes: 1
Views: 624
Reputation: 2810
IIUC, you need something like this:
I am taking the columns out of df as list.
div = float(len(cal_lat)) / float(len(actual_lat))
new_l = []
for i in range(len(cal_lat)):
new_l.append(actual_lat[int(i/div)])
print(new_l)
len(new_l)
Do, the same with longitude columns. Since these are GPS points you can tweak your model to have the accuracy of up to 3 digits, when taking the difference. So, keeping this in mind, starting from Actual_lat and lng , if your next value is same as the first, the difference won’t be much greater. Hopefully, I made sense and you have your solution.
Upvotes: 1
Reputation: 40
You need pandas.DataFrame.where.
Let's say your dataframe is df
, then you can do:
df.Actual_Lat = df.Actual_Lat.where(~df.Actual_Lat.isna(), df.Cal_lat)
Upvotes: 0