Nick
Nick

Reputation: 145

Find the Most Recent Matching in an Array [R]

Imagine an array of numbers called A. At each level of A, you want to find the most recent item with a matching value. You could easily do this with a for loop as follows:

A = c(1, 1, 2, 2, 1, 2, 2)

for(i in 1:length(A)){   
  if(i > 1 & sum(A[1:i-1] == A[i]) > 0){ 
    answer[i] = max(which(A[1:i-1] == A[i]))
  }else{
    answer[i] = NA
  }
}

But I want vectorize this for loop (because I'll be applying this principle on a very large data set). I tried using sapply:

answer = sapply(A, FUN = function(x){max(which(A == x))})

As you can see, I need some way of reducing the array to only values that come before x. Any advice?

Upvotes: 1

Views: 81

Answers (4)

Onyambu
Onyambu

Reputation: 79228

You can do:

sapply(seq_along(A)-1, function(x)ifelse(any(a<-A[x+1]==A[sequence(x)]),max(which(a)),NA))
[1] NA  1 NA  3  2  4  6

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can use seq_along to loop over the index of each element and then subset it and get the max index where the value last occured.

c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
#[1]   NA    1 -Inf    3    2    4    6

We can change the -Inf to NA if needed in that format

inds <- c(NA, sapply(seq_along(A)[-1], function(x) max(which(A[1:(x-1)] == A[x]))))
inds[is.infinite(inds)] <- NA
inds
#[1] NA  1 NA  3  2  4  6

The above method gives a warning, to remove the warning we can perform an additional check of the length

c(NA, sapply(seq_along(A)[-1], function(x) {
  inds <- which(A[1:(x-1)] == A[x])
 if (length(inds) > 0)
   max(inds)
 else
   NA
}))

#[1] NA  1 NA  3  2  4  6

Upvotes: 2

Nick
Nick

Reputation: 145

Here's a function that I made (based upon Ronak's answer):

lastMatch = function(A){
  uniqueItems = unique(A)
  firstInstances = sapply(uniqueItems, function(x){min(which(A == x))}) #for NA
  notFirstInstances = setdiff(seq(A),firstInstances)
  lastMatch_notFirstInstances = sapply(notFirstInstances, function(x) max(which(A[1:(x-1)] == A[x])))
  X = array(0, dim = c(0, length(A)))
  X[firstInstances] = NA
  X[notFirstInstances] = lastMatch_notFirstInstances
  return(X)
}

Upvotes: 0

Jon Spring
Jon Spring

Reputation: 66500

Here's an approach with dplyr which is more verbose, but easier for me to grok. We start with recording the row_number, make a group for each number we encounter, then record the prior matching row.

library(dplyr)
A2 <- A %>% 
  as_tibble() %>%
  mutate(row = row_number()) %>%
  group_by(value) %>%
  mutate(last_match = lag(row)) %>%
  ungroup()

Upvotes: 1

Related Questions