Reputation: 2821
Suppose I have a dataframe like,
library(dplyr)
data <- tibble(
label = c("a","a","b","a","c","c","a")
)
data$index <- 1:nrow(data)
I don't want to subset all the rows where label == "a"
, but only the first rows where this is true.
In the example, I would want the first two rows :
label index
<chr> <int>
1 a 1
2 a 2
because the next row the label is "b". All subsequent rows where label == "a" should be ignored.
I have implemented an ugly solution with a for loop, but surely there is an efficient way to filter like this?
Upvotes: 3
Views: 1246
Reputation: 887118
An option is also to do a comparison with the lag
of the column, create a numeric index with cumsum
and convert it to logical to filter
library(dplyr)
data %>%
filter(cumsum(label != lag(label, default = first(label))) < 1)
# A tibble: 2 x 2
# label index
# <chr> <int>
#1 a 1
#2 a 2
Upvotes: 0
Reputation: 8880
You can use:
data %>%
filter(data.table::rleid(label) == 1)
# A tibble: 2 x 2
label index
<chr> <int>
1 a 1
2 a 2
Upvotes: 2
Reputation: 11584
If you want to use just rle:
library(dplyr)
data %>% filter(rep(seq_along(rle(label)$values), rle(label)$lengths) == 1)
# A tibble: 2 x 2
label index
<chr> <int>
1 a 1
2 a 2
Upvotes: 1
Reputation: 39858
One option could be:
data %>%
slice_max(label == "a", n = 2, with_ties = FALSE)
label index
<chr> <int>
1 a 1
2 a 2
However, it may generate unexpected results when the n is bigger than the actual group size. A solution to overcome this issue:
data %>%
slice(head(which(label == "c"), 3))
label index
<chr> <int>
1 c 5
2 c 6
Upvotes: 1