tcor
tcor

Reputation: 53

Extract rows when value change in column with dplyr r

I'm trying to extract the rows each time there is a change in b :

   a b
1  1 A
2  4 A
3  5 A
4  3 B
5  3 B
6  2 B
7  4 B
8  6 A
9  2 A
10 4 C
11 1 C

So the result i'm expecting would be :

  a b
1 1 A
2 3 B
3 6 A
4 4 C

I thought on using the lag function in dplyr to extract the rows where the previous value of b is different but couldn't manage to do it...

Any help would be very appriciated!!

Upvotes: 5

Views: 625

Answers (3)

akrun
akrun

Reputation: 887128

An option with data.table

library(data.table)
setDT(df)[rowid(rleid(b)) == 1]
#   a b
#1: 1 A
#2: 3 B
#3: 6 A
#4: 4 C

data

df <- structure(list(a = c(1L, 4L, 5L, 3L, 3L, 2L, 4L, 6L, 2L, 4L, 
1L), b = c("A", "A", "A", "B", "B", "B", "B", "A", "A", "C", 
"C")), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11"))

Upvotes: 1

Ben
Ben

Reputation: 30474

If you did want to use lag and compare for differences to group, then you could do:

df %>%
  group_by(grp = cumsum(b != lag(b, default = first(b)))) %>%
  slice(1) 

In this case, you can set the default to first(b) for zero difference when evaluating the first row.

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One option could be:

df %>%
 group_by(rleid = with(rle(b), rep(seq_along(lengths), lengths))) %>%
 slice(1) 

      a b     rleid
  <int> <chr> <int>
1     1 A         1
2     3 B         2
3     6 A         3
4     4 C         4

Upvotes: 1

Related Questions