Marien
Marien

Reputation: 13

count number of sequence matches using two rows in dataframe

       Tijd nummer schaap                     code   Modifier comment status
1     2.971             1                stilstaan       <NA>      NA  START
2     5.457             1                   ruiken aan object      NA  POINT
3    10.703             1                stilstaan       <NA>      NA   STOP
4    10.704             1                    lopen       <NA>      NA  START
5    12.959             1                    lopen       <NA>      NA   STOP
6    12.960             1                stilstaan       <NA>      NA  START
7    22.732             1                   ruiken aan object      NA  POINT
8    29.383             1                stilstaan       <NA>      NA   STOP
9    29.384             1                    lopen       <NA>      NA  START
10   42.568             1                    lopen       <NA>      NA   STOP
11   42.569             1                   ruiken aan object      NA  POINT
12   49.206             1                    lopen       <NA>      NA  START
13   66.533             1                    lopen       <NA>      NA   STOP
14   66.534             1                stilstaan       <NA>      NA  START
15   67.134             1                   ruiken aan object      NA  POINT
16   72.999             1                stilstaan       <NA>      NA   STOP
17   73.000             1                    lopen       <NA>      NA  START
18   77.480             1                    lopen       <NA>      NA   STOP
19   77.481             1                stilstaan       <NA>      NA  START
20   81.773             1               rondkijken       <NA>      NA  START

I'm a behavioral biology student doing an internship and I have always used R to do my statistics but I sincerely don't know how to do what I want right now. This dataframe contains my observations (in dutch) and I want to count how many times "stilstaan" is followed by "ruiken" with modifier "aan object". I've been unable to find a way to do exactly what I want. I am able to count the number of times "stilstaan" is followed by "ruiken" using the code below but I don't know how to include the modifier. Is there a way to do this or am I asking for the impossible?

S=Excel_bestand_schapen
seq=c("stilstaan", "ruiken")
library(zoo)
result=rollapply(S, 2, identical, seq)
length(result[result == TRUE])    

Upvotes: 1

Views: 89

Answers (3)

CPak
CPak

Reputation: 13581

You can collapse the relevant columns into a single string

collapse <- paste(paste(dat$code, dat$Modifier), collapse=" ")
# [1] "stilstaan NA ruiken aan object stilstaan NA lopen ...

And define the pattern you want to search for

pattern <- "stilstaan NA ruiken aan object"

Use stringr::str_count to count matches

stringr::str_count(pattern, collapse)
# 3

Upvotes: 0

r.user.05apr
r.user.05apr

Reputation: 5456

Using base R and www's dat-data-frame:

sum(ifelse((dat$code == "stilstaan") & 
             (c(dat$code[2:length(dat$code)], NA) == "ruiken") &
             (c(dat$Modifier[2:length(dat$Modifier)], NA) == "aan object"),
           1, 0))

Upvotes: 0

www
www

Reputation: 39154

We can use the following code to filter for the rows that meet the requirement. lead can move the entire vector forward. The answer of this dataset is three.

library(dplyr)

dat2 <- dat %>%
  filter(code %in% "stilstaan" & lead(code) %in% "ruiken" & lead(Modifier) %in% "aan object") 

nrow(dat2)
# [1] 3

DATA

dat <- read.table(text = "       Tijd 'nummer schaap'                     code   Modifier comment status
1     2.971             1                stilstaan       NA      NA  START
                  2     5.457             1                   ruiken 'aan object'      NA  POINT
                  3    10.703             1                stilstaan         NA      NA   STOP
                  4    10.704             1                    lopen         NA      NA  START
                  5    12.959             1                    lopen         NA      NA   STOP
                  6    12.960             1                stilstaan         NA      NA  START
                  7    22.732             1                   ruiken 'aan object'      NA  POINT
                  8    29.383             1                stilstaan         NA      NA   STOP
                  9    29.384             1                    lopen         NA      NA  START
                  10   42.568             1                    lopen         NA      NA   STOP
                  11   42.569             1                   ruiken 'aan object'      NA  POINT
                  12   49.206             1                    lopen         NA      NA  START
                  13   66.533             1                    lopen         NA      NA   STOP
                  14   66.534             1                stilstaan         NA      NA  START
                  15   67.134             1                   ruiken 'aan object'      NA  POINT
                  16   72.999             1                stilstaan         NA      NA   STOP
                  17   73.000             1                    lopen         NA      NA  START
                  18   77.480             1                    lopen         NA      NA   STOP
                  19   77.481             1                stilstaan         NA      NA  START
                  20   81.773             1               rondkijken         NA      NA  START",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 1

Related Questions