Saltaf
Saltaf

Reputation: 65

Prevent duplicates in R

I have a column in a data table which has entries in non-decreasing order. But there can be duplicate entries.

labels <- c(123,123,124,125,126,126,128)
time <- data.table(labels,unique_labels="")
time
  labels unique_labels
1:    123              
2:    123              
3:    124              
4:    125              
5:    126              
6:    126              
7:    128  

I want to make all entries unique, so the output will be

time
      labels unique_labels
1:    123     123           
2:    123     124         
3:    124     125         
4:    125     126         
5:    126     127         
6:    126     128         
7:    128     130

Following is a loop implementation for this:

prev_label <- 0
unique_counter <- 0
for (i in 1:length(time$label)){
    if (time$label[i]!=prev_label)
        prev_label <- time$label[i]
    else
        unique_counter <- unique_counter + 1
    time$unique_label[i] <- time$label[i] + unique_counter
} 

Upvotes: 1

Views: 87

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76683

There's a vectorized solution that completly prevents you from using for loops. Since time is a R function I've changed the name of your data.frame to tm.

cumsum(duplicated(tm$labels)) + tm$labels
[1] 123 124 125 126 127 128 130

tm$unique_labels <- cumsum(duplicated(tm$labels)) + tm$labels
tm
   labels unique_labels
1:    123           123
2:    123           124
3:    124           125
4:    125           126
5:    126           127
6:    126           128
7:    128           130

Upvotes: 2

Bruno Zamengo
Bruno Zamengo

Reputation: 870

tank = ("t", 1:NROW(labels), sep="")
time$unique_labels = ifelse(duplicated(time), tank, time$labels)

the duplicated function of the data.table package returns the index of duplicated rows of your dataset, just replace them with "random" values you are sure are not used in your set

Upvotes: 1

Related Questions