ail
ail

Reputation: 149

Filter data with specific pattern on following line in R

I have data set that shows below:

ID, Colour
1, Red
1, Yellow
1, Blue
1, Green
1, Pink
1, Red
1, Yellow
1, Blue
1, Red
1, Red
2, Red
2, Yellow
2, Blue
2, Blue
2, Yellow
2, Red
2, Blue
2, Red
2, Red
3, Blue
3, Blue
3, Red
3, Red
3, Yellow
3, Blue
3, Red
3, Yellow
3, Blue

I want to filter the row, which consists of this pattern of Red, Yellow, Blue.

The result should be like this, with the index of each pattern:

ID, Colour, Index
        1, Red, 1
        1, Yellow, 1
        1, Blue, 1
        1, Red, 2
        1, Yellow, 2
        1, Blue, 2
        2, Red, 3
        2, Yellow, 3
        2, Blue, 3
        3, Red, 4
        3, Yellow, 4
        3, Blue, 4
        3, Red, 5
        3, Yellow, 5
        3, Blue, 5

Thank you in advance, hopefully, anyone can help.

Upvotes: 1

Views: 84

Answers (2)

langtang
langtang

Reputation: 24742

Here is a data.table approach, that uses Reduce and shift to find the IDs that have the pattern

library(data.table)

out = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]
out = out[,index:=.I][rep(1:.N,3)][, Colour:=rep(c("Red", "Yellow", "Blue"), length.out=.N)]

       ID Colour index
    <int> <char> <int>
 1:     1    Red     1
 2:     1 Yellow     2
 3:     2   Blue     3
 4:     3    Red     4
 5:     3 Yellow     5
 6:     1   Blue     1
 7:     1    Red     2
 8:     2 Yellow     3
 9:     3   Blue     4
10:     3    Red     5
11:     1 Yellow     1
12:     1   Blue     2
13:     2    Red     3
14:     3 Yellow     4
15:     3   Blue     5

Another good way to do this is create a function, f that creates the output, given some ids

f <- function(id) {
  data.table(
    ID = rep(id,each=3),
    Colour=rep(c("Red", "Yellow","Blue"),times=length(id)),
    index=rep(1:length(id), times=3))
}

Just get the ids, using the above strategy

ids = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]$ID

and apply to the function

f(ids)

       ID Colour index
    <int> <char> <int>
 1:     1    Red     1
 2:     1 Yellow     2
 3:     1   Blue     3
 4:     1    Red     4
 5:     1 Yellow     5
 6:     1   Blue     1
 7:     2    Red     2
 8:     2 Yellow     3
 9:     2   Blue     4
10:     3    Red     5
11:     3 Yellow     1
12:     3   Blue     2
13:     3    Red     3
14:     3 Yellow     4
15:     3   Blue     5

Upvotes: 1

akrun
akrun

Reputation: 887501

We may use

library(dplyr)
library(zoo)
df1 %>%
    group_by(ID) %>%
    slice(sort(unique(rep(which(rollapply(Colour, width = 3,
       FUN = toString) == "Red, Yellow, Blue"), each = 3) + 0:2))) %>%
    ungroup %>%
    mutate(Index = cumsum(Colour == "Red"))

-output

# A tibble: 15 × 3
      ID Colour Index
   <int> <chr>  <int>
 1     1 Red        1
 2     1 Yellow     1
 3     1 Blue       1
 4     1 Red        2
 5     1 Yellow     2
 6     1 Blue       2
 7     2 Red        3
 8     2 Yellow     3
 9     2 Blue       3
10     3 Red        4
11     3 Yellow     4
12     3 Blue       4
13     3 Red        5
14     3 Yellow     5
15     3 Blue       5

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L), Colour = c("Red", "Yellow", "Blue", "Green", "Pink", 
"Red", "Yellow", "Blue", "Red", "Red", "Red", "Yellow", "Blue", 
"Blue", "Yellow", "Red", "Blue", "Red", "Red", "Blue", "Blue", 
"Red", "Red", "Yellow", "Blue", "Red", "Yellow", "Blue")), row.names = c(NA, 
-28L), class = "data.frame")

Upvotes: 2

Related Questions