Elys
Elys

Reputation: 3

filter() rows from dataframe with condition on previous and next row, keeping NA values

I have a dataframe like this:

AA<-c(1,2,4,5,6,7,10,11,12,13,14,15)
BB<-c(32,21,21,NA,27,31,31,12,28,NA,48,7) 
df<- data.frame(AA,BB)

I want to remove rows where BB value is equal to previous or next row, to keep only first and last occurrences from each value of BB column. I also want to keep NA rows. I arrive to that code which is not so far from what I want:

lighten_df <- df %>% filter(BB!=lag(BB) | BB!=lead(BB) | is.na(BB) )

which gives me:

> lighten_df
AA BB
1   1 32
2   2 21
3   5 NA
4   6 27
5   7 31
6  10 31
7  11 12
8  12 28
9  13 NA
10 14 48
11 15  7

My problem is that I would like to keep first and last 21 value for col BB. That's the result I expect:

AA BB
1   1 32
2   2 21
3   4 21
4   5 NA
5   6 27
6   7 31
7  10 31
8  11 12
9  12 28
10 13 NA
11 14 48
12 15  7

Any Idea?

Upvotes: 0

Views: 201

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145755

I would suggest a different approach: define a grouping variable and keep the first and last rows within each group:

df %>%
  group_by(grp = data.table::rleid(BB)) %>%
  slice(unique(c(1, n())))
# # A tibble: 12 × 3
# # Groups:   grp [10]
#       AA    BB   grp
#    <dbl> <dbl> <int>
#  1     1    32     1
#  2     2    21     2
#  3     4    21     2
#  4     5    NA     3
#  5     6    27     4
#  6     7    31     5
#  7    10    31     5
#  8    11    12     6
#  9    12    28     7
# 10    13    NA     8
# 11    14    48     9
# 12    15     7    10

Upvotes: 1

Related Questions