Reputation: 145
Imagine an array of numbers called A. At each level of A, you want to find the most recent item with a matching value. You could easily do this with a for loop as follows:
A = c(1, 1, 2, 2, 1, 2, 2)
for(i in 1:length(A)){
if(i > 1 & sum(A[1:i-1] == A[i]) > 0){
answer[i] = max(which(A[1:i-1] == A[i]))
}else{
answer[i] = NA
}
}
But I want vectorize this for loop (because I'll be applying this principle on a very large data set). I tried using sapply:
answer = sapply(A, FUN = function(x){max(which(A == x))})
As you can see, I need some way of reducing the array to only values that come before x. Any advice?
Upvotes: 1
Views: 81
Reputation: 79228
You can do:
sapply(seq_along(A)-1, function(x)ifelse(any(a<-A[x+1]==A[sequence(x)]),max(which(a)),NA))
[1] NA 1 NA 3 2 4 6
Upvotes: 1
Reputation: 388982
We can use seq_along
to loop over the index of each element and then subset it and get the max
index where the value last occured.
c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
#[1] NA 1 -Inf 3 2 4 6
We can change the -Inf
to NA
if needed in that format
inds <- c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
inds[is.infinite(inds)] <- NA
inds
#[1] NA 1 NA 3 2 4 6
The above method gives a warning, to remove the warning we can perform an additional check of the length
c(NA, sapply(seq_along(A)[-1], function(x) {
inds <- which(A[1:(x-1)] == A[x])
if (length(inds) > 0)
max(inds)
else
NA
}))
#[1] NA 1 NA 3 2 4 6
Upvotes: 2
Reputation: 145
Here's a function that I made (based upon Ronak's answer):
lastMatch = function(A){
uniqueItems = unique(A)
firstInstances = sapply(uniqueItems, function(x){min(which(A == x))}) #for NA
notFirstInstances = setdiff(seq(A),firstInstances)
lastMatch_notFirstInstances = sapply(notFirstInstances, function(x) max(which(A[1:(x-1)] == A[x])))
X = array(0, dim = c(0, length(A)))
X[firstInstances] = NA
X[notFirstInstances] = lastMatch_notFirstInstances
return(X)
}
Upvotes: 0
Reputation: 66500
Here's an approach with dplyr
which is more verbose, but easier for me to grok. We start with recording the row_number, make a group for each number we encounter, then record the prior matching row.
library(dplyr)
A2 <- A %>%
as_tibble() %>%
mutate(row = row_number()) %>%
group_by(value) %>%
mutate(last_match = lag(row)) %>%
ungroup()
Upvotes: 1