Reputation: 55
I have the following dataset, where numerical values in column x are intertwined with NAs. I would like to keep the first instance of the numerical values across all numerical sequences and replace the remaining duplicated values in each sequence with NAs.
x = c(1,1,1,NA,NA,NA,3,3,3,NA,NA,1,1,1,NA)
data = data.frame(x)
> data
x
1 1
2 1
3 1
4 NA
5 NA
6 NA
7 3
8 3
9 3
10 NA
11 NA
12 1
13 1
14 1
15 NA
So that the final result should be:
> data
x
1 1
2 NA
3 NA
4 NA
5 NA
6 NA
7 3
8 NA
9 NA
10 NA
11 NA
12 1
13 NA
14 NA
15 NA
I would apprecite some suggestions, ideally with dplyr. Thanks!
Upvotes: 1
Views: 103
Reputation: 895
For those who want to stay within a dplyr
workflow:
library(dplyr)
data %>%
as_tibble() %>%
mutate(x = na_if(x, lag(x)))
#> # A tibble: 15 × 1
#> x
#> <dbl>
#> 1 1
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NA
#> 7 3
#> 8 NA
#> 9 NA
#> 10 NA
#> 11 NA
#> 12 1
#> 13 NA
#> 14 NA
#> 15 NA
Upvotes: 0
Reputation: 55
This simple solution seems to work as I expected, although it doesn't use dplyr.
data$x[data$x == lag(data$x)] <- NA
> data
x
1 1
2 NA
3 NA
4 NA
5 NA
6 NA
7 3
8 NA
9 NA
10 NA
11 NA
12 1
13 NA
14 NA
15 NA
Upvotes: 1