Reputation: 149
I have data set that shows below:
ID, Colour
1, Red
1, Yellow
1, Blue
1, Green
1, Pink
1, Red
1, Yellow
1, Blue
1, Red
1, Red
2, Red
2, Yellow
2, Blue
2, Blue
2, Yellow
2, Red
2, Blue
2, Red
2, Red
3, Blue
3, Blue
3, Red
3, Red
3, Yellow
3, Blue
3, Red
3, Yellow
3, Blue
I want to filter the row, which consists of this pattern of Red, Yellow, Blue.
The result should be like this, with the index of each pattern:
ID, Colour, Index
1, Red, 1
1, Yellow, 1
1, Blue, 1
1, Red, 2
1, Yellow, 2
1, Blue, 2
2, Red, 3
2, Yellow, 3
2, Blue, 3
3, Red, 4
3, Yellow, 4
3, Blue, 4
3, Red, 5
3, Yellow, 5
3, Blue, 5
Thank you in advance, hopefully, anyone can help.
Upvotes: 1
Views: 84
Reputation: 24742
Here is a data.table approach, that uses Reduce
and shift
to find the IDs that have the pattern
library(data.table)
out = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]
out = out[,index:=.I][rep(1:.N,3)][, Colour:=rep(c("Red", "Yellow", "Blue"), length.out=.N)]
ID Colour index
<int> <char> <int>
1: 1 Red 1
2: 1 Yellow 2
3: 2 Blue 3
4: 3 Red 4
5: 3 Yellow 5
6: 1 Blue 1
7: 1 Red 2
8: 2 Yellow 3
9: 3 Blue 4
10: 3 Red 5
11: 1 Yellow 1
12: 1 Blue 2
13: 2 Red 3
14: 3 Yellow 4
15: 3 Blue 5
Another good way to do this is create a function, f
that creates the output, given some ids
f <- function(id) {
data.table(
ID = rep(id,each=3),
Colour=rep(c("Red", "Yellow","Blue"),times=length(id)),
index=rep(1:length(id), times=3))
}
Just get the ids, using the above strategy
ids = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]$ID
and apply to the function
f(ids)
ID Colour index
<int> <char> <int>
1: 1 Red 1
2: 1 Yellow 2
3: 1 Blue 3
4: 1 Red 4
5: 1 Yellow 5
6: 1 Blue 1
7: 2 Red 2
8: 2 Yellow 3
9: 2 Blue 4
10: 3 Red 5
11: 3 Yellow 1
12: 3 Blue 2
13: 3 Red 3
14: 3 Yellow 4
15: 3 Blue 5
Upvotes: 1
Reputation: 887501
We may use
library(dplyr)
library(zoo)
df1 %>%
group_by(ID) %>%
slice(sort(unique(rep(which(rollapply(Colour, width = 3,
FUN = toString) == "Red, Yellow, Blue"), each = 3) + 0:2))) %>%
ungroup %>%
mutate(Index = cumsum(Colour == "Red"))
-output
# A tibble: 15 × 3
ID Colour Index
<int> <chr> <int>
1 1 Red 1
2 1 Yellow 1
3 1 Blue 1
4 1 Red 2
5 1 Yellow 2
6 1 Blue 2
7 2 Red 3
8 2 Yellow 3
9 2 Blue 3
10 3 Red 4
11 3 Yellow 4
12 3 Blue 4
13 3 Red 5
14 3 Yellow 5
15 3 Blue 5
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), Colour = c("Red", "Yellow", "Blue", "Green", "Pink",
"Red", "Yellow", "Blue", "Red", "Red", "Red", "Yellow", "Blue",
"Blue", "Yellow", "Red", "Blue", "Red", "Red", "Blue", "Blue",
"Red", "Red", "Yellow", "Blue", "Red", "Yellow", "Blue")), row.names = c(NA,
-28L), class = "data.frame")
Upvotes: 2