Rfanatic
Rfanatic

Reputation: 2280

How to read identify the indexes of pairs of numbers from a data.frame?

I have a large data.frame:

t1   t2   t3   t4   t5   t6   t7   t8
7    15   30   37    4   11   30   37
4    31   44   30   37  39    44   18
3    49   39   34   44   43   26   24
4    31   26   33   12   47   37   15
3    27   34   23   30   30   37    4
9    46   39   34    8   43   26   24

For each row, I would like to identify specific (eg. read into by user) sequences of numbers in column t1 to t8 .

A sequence consists of numbers that follow each other in a chronological order (time is defined by t1...t8)

Example of sequences:

30, 37 happening at [t3, t4] as well [t7, t8]

As you see from the above example I want the index of the start and end columns (eg time t1...t8) and the number of times this occurs.

Desire input:

Please specify your sequence: 30 37 

Desired output:

'The timing of 30 37 is: 

     [t3] to [t4] 
     [t7] to [t8] 
     [t4] to [t5] 

My question is how to write a function that identify the indexes of a specific sequences. Any help is welcomed, please

Below the code that I want to improve:

apply(m, 1, function(x) {
  u <- unique(x)
  u <- u[sapply(u, function(u) any(diff(which(x == u)) > 1))]
  lapply(setNames(u, u), function(u){ 
      ind <- which(x == u)
      lapply(seq(length(ind) - 1), 
             function(i) x[seq(ind[i] + 1, ind[i + 1] - 1)])
  })
})

Upvotes: 1

Views: 126

Answers (2)

Niels Holst
Niels Holst

Reputation: 626

An alternative solution with plyr package and without do.call:

library(plyr)

obs = read.table(text=
  "t1   t2   t3   t4   t5   t6   t7   t8
  7    15   30   37    4   11   30   37
  4    31   44   30   37  39    44   18
  3    49   39   34   44   43   26   24
  4    31   26   33   12   47   37   15
  3    27   34   23   30   30   37    4
  9    46   39   34    8   43   26   24",
  header=TRUE)

# Find target in one row
f = function(v, target) {
  n = length(v)
  m = length(target)
  res = {}
  for (i in 1:(n-m+1)) {
    if (all(target==v[i:(i+m-1)])) res = c(res,i)
  }
  data.frame(From=res, To=res+m-1)
}

# Find target in all rows
find_matches = function(df, target) {
  df$Row = 1:nrow(df)
  M = adply(df, 1, f, target=target)
  M[, (ncol(M)-2):ncol(M)]
}

# Test
find_matches(obs, c(30,37))
#  Row From To
#1   1    3  4
#2   1    7  8
#3   2    4  5
#4   5    6  7

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

Here is one function which can be helpful. For every row, we paste every element with it's next element and check if it matches with the numbers passed. The function returns a dataframe with row number and column names where a match is found.

return_match <- function(df, x, y) {
   #Paste the numbers to match
   concat_str <- paste(x, y, sep = "-")
   #For every row in dataframe
   do.call(rbind, lapply(seq_len(nrow(df)), function(i) {
       #Subset the row
       x <- df[i, ]
       #Paste every value with it's next value and compare it with concat_str
       inds = paste(x[-length(x)], x[-1L], sep = "-") == concat_str
       if(any(inds)) {
         #Get the column numbers to match
         row <- which(inds)
         #subset the column name and add row number
         transform(as.data.frame(t(sapply(row, function(y) 
                   names(df)[c(y, y + 1)]))), row = i)
       }
    }))
}


return_match(df, 30, 37)
#  V1 V2 row
#1 t3 t4   1
#2 t7 t8   1
#3 t4 t5   2
#4 t6 t7   5


return_match(df, 39, 34)
#  V1 V2 row
#1 t3 t4   3
#2 t3 t4   6

Upvotes: 0

Related Questions