Reputation: 53
I'm trying to extract the rows each time there is a change in b :
a b
1 1 A
2 4 A
3 5 A
4 3 B
5 3 B
6 2 B
7 4 B
8 6 A
9 2 A
10 4 C
11 1 C
So the result i'm expecting would be :
a b
1 1 A
2 3 B
3 6 A
4 4 C
I thought on using the lag function in dplyr to extract the rows where the previous value of b is different but couldn't manage to do it...
Any help would be very appriciated!!
Upvotes: 5
Views: 625
Reputation: 887128
An option with data.table
library(data.table)
setDT(df)[rowid(rleid(b)) == 1]
# a b
#1: 1 A
#2: 3 B
#3: 6 A
#4: 4 C
df <- structure(list(a = c(1L, 4L, 5L, 3L, 3L, 2L, 4L, 6L, 2L, 4L,
1L), b = c("A", "A", "A", "B", "B", "B", "B", "A", "A", "C",
"C")), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11"))
Upvotes: 1
Reputation: 30474
If you did want to use lag
and compare for differences to group, then you could do:
df %>%
group_by(grp = cumsum(b != lag(b, default = first(b)))) %>%
slice(1)
In this case, you can set the default to first(b)
for zero difference when evaluating the first row.
Upvotes: 0
Reputation: 39858
One option could be:
df %>%
group_by(rleid = with(rle(b), rep(seq_along(lengths), lengths))) %>%
slice(1)
a b rleid
<int> <chr> <int>
1 1 A 1
2 3 B 2
3 6 A 3
4 4 C 4
Upvotes: 1