Martyna F
Martyna F

Reputation: 55

Keep first duplicate in a sequence across all sequences of numerical values and replace the remaining values with NA in R

I have the following dataset, where numerical values in column x are intertwined with NAs. I would like to keep the first instance of the numerical values across all numerical sequences and replace the remaining duplicated values in each sequence with NAs.

x = c(1,1,1,NA,NA,NA,3,3,3,NA,NA,1,1,1,NA)
data = data.frame(x)

> data
    x
1   1
2   1
3   1
4  NA
5  NA
6  NA
7   3
8   3
9   3
10 NA
11 NA
12  1
13  1
14  1
15 NA

So that the final result should be:

> data
    x
1   1
2  NA
3  NA
4  NA
5  NA
6  NA
7   3
8  NA
9  NA
10 NA
11 NA
12  1
13 NA
14 NA
15 NA

I would apprecite some suggestions, ideally with dplyr. Thanks!

Upvotes: 1

Views: 103

Answers (2)

Stefan
Stefan

Reputation: 895

For those who want to stay within a dplyr workflow:

library(dplyr)
data %>%
  as_tibble() %>%
  mutate(x = na_if(x, lag(x)))
#> # A tibble: 15 × 1
#>        x
#>    <dbl>
#>  1     1
#>  2    NA
#>  3    NA
#>  4    NA
#>  5    NA
#>  6    NA
#>  7     3
#>  8    NA
#>  9    NA
#> 10    NA
#> 11    NA
#> 12     1
#> 13    NA
#> 14    NA
#> 15    NA

Upvotes: 0

Martyna F
Martyna F

Reputation: 55

This simple solution seems to work as I expected, although it doesn't use dplyr.

data$x[data$x == lag(data$x)] <- NA

> data
    x
1   1
2  NA
3  NA
4  NA
5  NA
6  NA
7   3
8  NA
9  NA
10 NA
11 NA
12  1
13 NA
14 NA
15 NA

Upvotes: 1

Related Questions