user1627466
user1627466

Reputation: 423

Removing duplicates in vector but preserving order

Suppose a vector :

vec = c(NA,NA,1,NA,NA,NA,1,NA,NA,0,NA,NA,0,NA,NA,0,NA,NA,1,NA,NA,1,NA,NA,0,NA,0)

I would like to get :

vec = c(NA,NA,1,NA,NA,NA,NA,NA,NA,0,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,0,NA,NA)

I have tried a for loop with an if checking if the value is equal to the previous non NA, but it doesn't work when it is repeated more than once.

Remove duplicates in vector to next value doesn't work either since I want to keep my NAs.

Upvotes: 3

Views: 813

Answers (3)

Carl Witthoft
Carl Witthoft

Reputation: 21492

I think this does it:

vrl<-rle(vec)
diff(vrl$values[!is.na(vrl$values)])->vdif
vdif<-c(1,vdif)
vrl$values[!is.na(vrl$values)][vdif==0]<-NA
inverse.rle(vrl)
# [1] NA NA  1 NA NA NA NA NA NA  0 NA NA NA NA NA NA NA NA
#[19]  1 NA NA NA NA NA  0 NA NA

The trick in there was to prepend a 1 to the difference vector so that the very first non-NA location is preserved.

Upvotes: 3

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

You can do this with a little bit of logic and a compound [ and [<- operation. First we need to find the duplicates. We'll do this with diff() on all the non NA values...

diff( vec[ ! is.na( vec ) ]
[1]  0 -1  0  0  1  0 -1  0

Each 0 is a duplicate. Now we need to find their position in vec and set them to NA..

#  This gives us a vector of TRUE/FALSE values which we will use to subset vec to the values we want to change
dups <- c( 1 , diff( vec[ ! is.na( vec ) ] ) ) == 0

#  Now subset vec to non NA values and change the duplicates to NA
vec[ ! is.na( vec ) ][ dups ] <- NA
# [1] NA NA  1 NA NA NA NA NA NA NA NA  0 NA NA NA NA NA NA NA NA NA  1 NA NA NA
#[26] NA NA  0 NA NA

Upvotes: 5

Roland
Roland

Reputation: 132676

Use duplicated:

vec[duplicated(vec, incomparables=NA)] <- NA

You could omit the incomparables parameter in your example:

vec[duplicated(vec)] <- NA

According to the documentation this might be faster, but you'd need to benchmark it yourself.

Edit:

After clarification:

vec <- c(NA,NA,1,NA,NA,NA,1,NA,NA,NA,NA,0,NA,NA,0,NA,NA,0,NA,NA,NA,1,NA,NA,1,NA,NA,0,NA,0)
vec2 <- c(NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,0,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,0,NA,NA)

tmp <- vec[!is.na(vec)]
tmp[c(FALSE, diff(tmp)==0)] <- NA
vec[!is.na(vec)] <- tmp

identical(vec, vec2)
#[1] TRUE

Upvotes: 4

Related Questions