Reputation: 2280
I have a large data.frame:
t1 t2 t3 t4 t5 t6 t7 t8
7 15 30 37 4 11 30 37
4 31 44 30 37 39 44 18
3 49 39 34 44 43 26 24
4 31 26 33 12 47 37 15
3 27 34 23 30 30 37 4
9 46 39 34 8 43 26 24
For each row, I would like to identify specific (eg. read into by user) sequences of numbers in column t1 to t8 .
A sequence consists of numbers that follow each other in a chronological order (time is defined by t1...t8)
Example of sequences:
30, 37
happening at [t3, t4]
as well [t7, t8]
As you see from the above example I want the index of the start and end columns (eg time t1...t8) and the number of times this occurs.
Desire input:
Please specify your sequence: 30 37
Desired output:
'The timing of 30 37 is:
[t3] to [t4]
[t7] to [t8]
[t4] to [t5]
My question is how to write a function that identify the indexes of a specific sequences. Any help is welcomed, please
Below the code that I want to improve:
apply(m, 1, function(x) {
u <- unique(x)
u <- u[sapply(u, function(u) any(diff(which(x == u)) > 1))]
lapply(setNames(u, u), function(u){
ind <- which(x == u)
lapply(seq(length(ind) - 1),
function(i) x[seq(ind[i] + 1, ind[i + 1] - 1)])
})
})
Upvotes: 1
Views: 126
Reputation: 626
An alternative solution with plyr package and without do.call:
library(plyr)
obs = read.table(text=
"t1 t2 t3 t4 t5 t6 t7 t8
7 15 30 37 4 11 30 37
4 31 44 30 37 39 44 18
3 49 39 34 44 43 26 24
4 31 26 33 12 47 37 15
3 27 34 23 30 30 37 4
9 46 39 34 8 43 26 24",
header=TRUE)
# Find target in one row
f = function(v, target) {
n = length(v)
m = length(target)
res = {}
for (i in 1:(n-m+1)) {
if (all(target==v[i:(i+m-1)])) res = c(res,i)
}
data.frame(From=res, To=res+m-1)
}
# Find target in all rows
find_matches = function(df, target) {
df$Row = 1:nrow(df)
M = adply(df, 1, f, target=target)
M[, (ncol(M)-2):ncol(M)]
}
# Test
find_matches(obs, c(30,37))
# Row From To
#1 1 3 4
#2 1 7 8
#3 2 4 5
#4 5 6 7
Upvotes: 1
Reputation: 388797
Here is one function which can be helpful. For every row, we paste every element with it's next element and check if it matches with the numbers passed. The function returns a dataframe with row number and column names where a match is found.
return_match <- function(df, x, y) {
#Paste the numbers to match
concat_str <- paste(x, y, sep = "-")
#For every row in dataframe
do.call(rbind, lapply(seq_len(nrow(df)), function(i) {
#Subset the row
x <- df[i, ]
#Paste every value with it's next value and compare it with concat_str
inds = paste(x[-length(x)], x[-1L], sep = "-") == concat_str
if(any(inds)) {
#Get the column numbers to match
row <- which(inds)
#subset the column name and add row number
transform(as.data.frame(t(sapply(row, function(y)
names(df)[c(y, y + 1)]))), row = i)
}
}))
}
return_match(df, 30, 37)
# V1 V2 row
#1 t3 t4 1
#2 t7 t8 1
#3 t4 t5 2
#4 t6 t7 5
return_match(df, 39, 34)
# V1 V2 row
#1 t3 t4 3
#2 t3 t4 6
Upvotes: 0