Reputation: 6179
I have multiple vehicles data set, each with a unique ID, Vehicle.ID2
. Following is part of the data for 1 vehicle only:
df <- structure(list(Vehicle.ID2 = c("4-2", "4-2", "4-2", "4-2", "4-2",
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2",
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2"), Time = c(3, 3.2, 3.4,
3.6, 3.8, 4, 4.2, 4.4, 4.6, 4.8, 5, 5.2, 5.4, 5.6, 5.8, 6, 6.2,
6.4, 6.6, 6.8), yposition = c(3.451, 7.357, 11.264, 15.171, 19.077,
22.984, 26.89, 30.797, 34.704, 38.61, 42.517, 46.423, 50.33,
54.236, 58.143, 62.05, 65.956, 69.863, 73.769, 77.676), LeadVehyposition2 = c(55.043,
NA, 64.098, 68.626, 73.153, 77.681, 82.209, 86.736, 91.264, 95.791,
100.319, 104.847, 109.374, 113.902, 118.429, 122.957, 127.485,
132.012, 136.54, 141.067)), .Names = c("Vehicle.ID2", "Time",
"yposition", "LeadVehyposition2"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -20L))
I want to compare the LeadVehyposition2
with yposition
in the df
and output the first Time
where the yposition
is greater than or equal to LeadVehyposition2
. With 1 vehicle, I can do this by using the following code for the 1st value in LeadVehyposition2
:
df$Time[head(which(df$yposition>=55.043),1)]
> 5.8
Here, the first value in the LeadVehyposition2
was 55.043 and I compared it with all the values in the yposition
. I want to do the same with all values in LeadVehyposition2
. Following is the code that didn't work for the whole data set (multiple Vehicle IDs):
library(dplyr)
mydata %>%
group_by(Vehicle.ID2) %>%
mutate(Time.PET = Time[head(which(yposition>=LeadVehyposition2),1)]%>%
ungroup()
The problem is that using the second piece of code compares the values of yposition
and LeadVehyposition2
by rows only. But, the objective is to keep LeadVehyposition2
constant and compare it with the whole column of yposition
. How can I solve this problem?
Upvotes: 2
Views: 71
Reputation: 215057
Here is a possible way to do it in base
;
df$Time[sapply(df$LeadVehyposition2, function(p) min(which(df$yposition >= p)))]
[1] 5.8 NA 6.2 6.4 6.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Or:
with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])
[1] 5.8 NA 6.2 6.4 6.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
To take care of the grouping by Vehicle issue:
df <- df[order(df$Vehicle.ID2, df$Time), ]
do.call(c, sapply(split(df, df$Vehicle.ID2), function(df)
with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])))
Upvotes: 3
Reputation: 49448
You can use rolling joins for this:
library(data.table)
setDT(df)
# create an index to be used for matching
df[, idx := 1:.N, by = Vehicle.ID2]
# find the matching index using rolling joins
df[, idx.m := .SD[.SD, on = c('Vehicle.ID2', yposition = 'LeadVehyposition2'), roll = T,
idx + 1]][1:5]
# Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m
#1: 4-2 3.0 3.451 55.043 1 15
#2: 4-2 3.2 7.357 NA 2 NA
#3: 4-2 3.4 11.264 64.098 3 17
#4: 4-2 3.6 15.171 68.626 4 18
#5: 4-2 3.8 19.077 73.153 5 19
# get the time for each match
df[, Time.PET := Time[idx.m], by = Vehicle.ID2][1:5]
# Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m Time.PET
#1: 4-2 3.0 3.451 55.043 1 15 5.8
#2: 4-2 3.2 7.357 NA 2 NA NA
#3: 4-2 3.4 11.264 64.098 3 17 6.2
#4: 4-2 3.6 15.171 68.626 4 18 6.4
#5: 4-2 3.8 19.077 73.153 5 19 6.6
If yposition
and LeadVehyposition2
can be strictly equal I would suggest adding very small (positive) jitter to yposition
for the above to work correctly.
Another option, with the latest development version of data.table
adding non-equi joins, can be:
library(data.table)
setDT(df)
df[df, on = .(Vehicle.ID2, yposition >= LeadVehyposition2), Time[1], by = .EACHI][1:5]
# Vehicle.ID2 yposition V1
#1: 4-2 55.043 5.8
#2: 4-2 NA NA
#3: 4-2 64.098 6.2
#4: 4-2 68.626 6.4
#5: 4-2 73.153 6.6
Which reads - join df
with itself on columns where Vehicle.ID2
is the same and yposition
is larger than or equal to LeadVehyposition2
, then take the first Time
for each "i" (aka the first argument of [.data.table
).
And you can of course assign that as a column:
df[, Time.PET := .SD[.SD, on = .(Vehicle.ID2, yposition >= LeadVehyposition2),
Time[1], by = .EACHI]$V1]
NOTE: both answers assume that yposition
is already sorted in increasing order.
Upvotes: 2
Reputation: 26258
A data.table
method could be to join the df
to itself, then take the minimum Time
with a positive difference between yposition
and LeadVehyposition2
.
library(data.table)
setDT(df)
res <- df[ df[, .(Vehicle.ID2, Time, yposition)], on = c("Vehicle.ID2"), allow.cartesian=T][i.yposition - LeadVehyposition2 > 0, .(min(i.Time)), by = .(Vehicle.ID2, Time, LeadVehyposition2)]
res
# Vehicle.ID2 Time LeadVehyposition2 V1
# 1: 4-2 3.0 55.043 5.8
# 2: 4-2 3.4 64.098 6.2
# 3: 4-2 3.6 68.626 6.4
# 4: 4-2 3.8 73.153 6.6
Joining this back onto df
will add the extra column onto the original data
res[df, on = c("Vehicle.ID2","Time","LeadVehyposition2")]
# Vehicle.ID2 Time LeadVehyposition2 V1 yposition
# 1: 4-2 3.0 55.043 5.8 3.451
# 2: 4-2 3.2 NA NA 7.357
# 3: 4-2 3.4 64.098 6.2 11.264
# 4: 4-2 3.6 68.626 6.4 15.171
# 5: 4-2 3.8 73.153 6.6 19.077
# 6: 4-2 4.0 77.681 NA 22.984
# ...
# 17: 4-2 6.2 127.485 NA 65.956
# 18: 4-2 6.4 132.012 NA 69.863
# 19: 4-2 6.6 136.540 NA 73.769
# 20: 4-2 6.8 141.067 NA 77.676
Upvotes: 3