Reputation: 6179

How to effectively use which() to compare one column and a row in r?

Data

I have multiple vehicles data set, each with a unique ID, Vehicle.ID2. Following is part of the data for 1 vehicle only:

df <- structure(list(Vehicle.ID2 = c("4-2", "4-2", "4-2", "4-2", "4-2", 
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", 
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2"), Time = c(3, 3.2, 3.4, 
3.6, 3.8, 4, 4.2, 4.4, 4.6, 4.8, 5, 5.2, 5.4, 5.6, 5.8, 6, 6.2, 
6.4, 6.6, 6.8), yposition = c(3.451, 7.357, 11.264, 15.171, 19.077, 
22.984, 26.89, 30.797, 34.704, 38.61, 42.517, 46.423, 50.33, 
54.236, 58.143, 62.05, 65.956, 69.863, 73.769, 77.676), LeadVehyposition2 = c(55.043, 
NA, 64.098, 68.626, 73.153, 77.681, 82.209, 86.736, 91.264, 95.791, 
100.319, 104.847, 109.374, 113.902, 118.429, 122.957, 127.485, 
132.012, 136.54, 141.067)), .Names = c("Vehicle.ID2", "Time", 
"yposition", "LeadVehyposition2"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -20L))

What I want to do

I want to compare the LeadVehyposition2 with yposition in the df and output the first Time where the yposition is greater than or equal to LeadVehyposition2. With 1 vehicle, I can do this by using the following code for the 1st value in LeadVehyposition2:

df$Time[head(which(df$yposition>=55.043),1)]
> 5.8

Here, the first value in the LeadVehyposition2 was 55.043 and I compared it with all the values in the yposition. I want to do the same with all values in LeadVehyposition2. Following is the code that didn't work for the whole data set (multiple Vehicle IDs):

library(dplyr)
mydata %>%
group_by(Vehicle.ID2) %>% 
mutate(Time.PET = Time[head(which(yposition>=LeadVehyposition2),1)]%>%
ungroup()

Problem:

The problem is that using the second piece of code compares the values of yposition and LeadVehyposition2 by rows only. But, the objective is to keep LeadVehyposition2 constant and compare it with the whole column of yposition. How can I solve this problem?

Upvotes: 2

Answers (3)

akuiper

Reputation: 215057

Here is a possible way to do it in base;

df$Time[sapply(df$LeadVehyposition2, function(p) min(which(df$yposition >= p)))]
 [1] 5.8  NA 6.2 6.4 6.6  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA

Or:

with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])
 [1] 5.8  NA 6.2 6.4 6.6  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA

To take care of the grouping by Vehicle issue:

df <- df[order(df$Vehicle.ID2, df$Time), ]
do.call(c, sapply(split(df, df$Vehicle.ID2), function(df) 
        with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])))

Upvotes: 3

eddi

Reputation: 49448

You can use rolling joins for this:

library(data.table)
setDT(df)

# create an index to be used for matching
df[, idx := 1:.N, by = Vehicle.ID2]

# find the matching index using rolling joins
df[, idx.m := .SD[.SD, on = c('Vehicle.ID2', yposition = 'LeadVehyposition2'), roll = T,
                 idx + 1]][1:5]
#   Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m
#1:         4-2  3.0     3.451            55.043   1    15
#2:         4-2  3.2     7.357                NA   2    NA
#3:         4-2  3.4    11.264            64.098   3    17
#4:         4-2  3.6    15.171            68.626   4    18
#5:         4-2  3.8    19.077            73.153   5    19

# get the time for each match
df[, Time.PET := Time[idx.m], by = Vehicle.ID2][1:5]
#   Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m Time.PET
#1:         4-2  3.0     3.451            55.043   1    15      5.8
#2:         4-2  3.2     7.357                NA   2    NA       NA
#3:         4-2  3.4    11.264            64.098   3    17      6.2
#4:         4-2  3.6    15.171            68.626   4    18      6.4
#5:         4-2  3.8    19.077            73.153   5    19      6.6

If yposition and LeadVehyposition2 can be strictly equal I would suggest adding very small (positive) jitter to yposition for the above to work correctly.

Another option, with the latest development version of data.table adding non-equi joins, can be:

library(data.table)
setDT(df)

df[df, on = .(Vehicle.ID2, yposition >= LeadVehyposition2), Time[1], by = .EACHI][1:5]
#   Vehicle.ID2 yposition  V1
#1:         4-2    55.043 5.8
#2:         4-2        NA  NA
#3:         4-2    64.098 6.2
#4:         4-2    68.626 6.4
#5:         4-2    73.153 6.6

Which reads - join df with itself on columns where Vehicle.ID2 is the same and yposition is larger than or equal to LeadVehyposition2, then take the first Time for each "i" (aka the first argument of [.data.table).

And you can of course assign that as a column:

df[, Time.PET := .SD[.SD, on = .(Vehicle.ID2, yposition >= LeadVehyposition2),
                    Time[1], by = .EACHI]$V1]

NOTE: both answers assume that yposition is already sorted in increasing order.

Upvotes: 2

SymbolixAU

Reputation: 26258

A data.table method could be to join the df to itself, then take the minimum Time with a positive difference between yposition and LeadVehyposition2.

library(data.table)
setDT(df)

res <- df[ df[, .(Vehicle.ID2, Time, yposition)], on = c("Vehicle.ID2"), allow.cartesian=T][i.yposition - LeadVehyposition2 > 0, .(min(i.Time)), by = .(Vehicle.ID2, Time, LeadVehyposition2)]
res
#     Vehicle.ID2 Time LeadVehyposition2  V1
# 1:         4-2  3.0            55.043 5.8
# 2:         4-2  3.4            64.098 6.2
# 3:         4-2  3.6            68.626 6.4
# 4:         4-2  3.8            73.153 6.6

Joining this back onto df will add the extra column onto the original data

res[df, on = c("Vehicle.ID2","Time","LeadVehyposition2")]

#      Vehicle.ID2 Time LeadVehyposition2  V1 yposition
#  1:         4-2  3.0            55.043 5.8     3.451
#  2:         4-2  3.2                NA  NA     7.357
#  3:         4-2  3.4            64.098 6.2    11.264
#  4:         4-2  3.6            68.626 6.4    15.171
#  5:         4-2  3.8            73.153 6.6    19.077
#  6:         4-2  4.0            77.681  NA    22.984
# ...
# 17:         4-2  6.2           127.485  NA    65.956
# 18:         4-2  6.4           132.012  NA    69.863
# 19:         4-2  6.6           136.540  NA    73.769
# 20:         4-2  6.8           141.067  NA    77.676