Keep first duplicate in a sequence across all sequences of numerical values and replace the remaining values with NA in R

Question

I have the following dataset, where numerical values in column x are intertwined with NAs. I would like to keep the first instance of the numerical values across all numerical sequences and replace the remaining duplicated values in each sequence with NAs.

x = c(1,1,1,NA,NA,NA,3,3,3,NA,NA,1,1,1,NA)
data = data.frame(x)

> data
    x
1   1
2   1
3   1
4  NA
5  NA
6  NA
7   3
8   3
9   3
10 NA
11 NA
12  1
13  1
14  1
15 NA

So that the final result should be:

> data
    x
1   1
2  NA
3  NA
4  NA
5  NA
6  NA
7   3
8  NA
9  NA
10 NA
11 NA
12  1
13 NA
14 NA
15 NA

I would apprecite some suggestions, ideally with dplyr. Thanks!

Martyna F · Accepted Answer

This simple solution seems to work as I expected, although it doesn't use dplyr.

data$x[data$x == lag(data$x)] <- NA

> data
    x
1   1
2  NA
3  NA
4  NA
5  NA
6  NA
7   3
8  NA
9  NA
10 NA
11 NA
12  1
13 NA
14 NA
15 NA

Keep first duplicate in a sequence across all sequences of numerical values and replace the remaining values with NA in R

Answers (2)

Related Questions