Daren Eiri
Daren Eiri

Reputation: 137

Increment by one to each duplicate value

I am trying to find a proper way, in R, to find duplicated values, and add the value 1 to each subsequent duplicated value grouped by id. For example:

data = data.table(id = c('1','1','1','1','1','2','2','2'),
                  value = c(95,100,101,101,101,20,35,38))

data$new_value <- ifelse(data[ , data$value] == lag(data$value,1),
                         lag(data$value, 1) + 1 ,data$value)
data$desired_value <- c(95,100,101,102,103,20,35,38)

Produces:

   id value new_value desired_value
1:  1    95        NA            95
2:  1   100       100           100
3:  1   101       101           101 # first 101 in id 1: add 0
4:  1   101       102           102 # second 101 in id 1: add 1
5:  1   101       102           103 # third 101 in id 1: add 2
6:  2    20        20            20
7:  2    35        35            35
8:  2    38        38            38

I tried doing this with ifelse, but it doesn't work recursively so it only applies to the following row, and not any subsequent rows. Also the lag function results in me losing the first value in value.

I've seen examples with character variables with make.names or make.unique, but haven't been able to find a solution for a duplicated numeric value.

Background: I am doing a survival analysis and I am finding that with my data there are stop times that are the same, so I need to make it unique by adding a 1 (stop times are in seconds).

Upvotes: 6

Views: 2450

Answers (4)

Henrik
Henrik

Reputation: 67778

To avoid (a potentially costly) by, you may use rowid:

data[, res := value + rowid(id, value) - 1]
# data
#    id value new_value desired_value res
# 1:  1    95        96            95  95
# 2:  1   100       101           100 100
# 3:  1   101       102           101 101
# 4:  1   101       102           102 102
# 5:  1   101       102           103 103
# 6:  2    20        21            20  20
# 7:  2    35        36            35  35
# 8:  2    38        39            38  38 

Upvotes: 1

thelatemail
thelatemail

Reputation: 93813

Here's an attempt. You're essentially grouping by id and value and adding 0:(length(value)-1). So:

data[, onemore := value + (0:(.N-1)), by=.(id, value)]

#   id value new_value desired_value onemore
#1:  1    95        96            95      95
#2:  1   100       101           100     100
#3:  1   101       102           101     101
#4:  1   101       102           102     102
#5:  1   101       102           103     103
#6:  2    20        21            20      20
#7:  2    35        36            35      35
#8:  2    38        39            38      38

Upvotes: 6

akrun
akrun

Reputation: 887128

Here is one option with tidyverse

library(dplyr)
data %>%
    group_by(id, value) %>%
    mutate(onemore = value + row_number()-1)
#     id value onemore
#  <chr> <dbl>   <dbl>
#1     1    95      95
#2     1   100     100
#3     1   101     101
#4     1   101     102
#5     1   101     103
#6     2    20      20
#7     2    35      35
#8     2    38      38

Or we can use base R without anonymous function call

data$onemore <- with(data, value + ave(value, id, value, FUN =seq_along)-1)
data$onemore
#[1]  95 100 101 102 103  20  35  38

Upvotes: 5

Ronak Shah
Ronak Shah

Reputation: 388982

With base R we can use ave where we take the first value of each group and basically add the row number of that row in that group.

data$value1 <- ave(data$value, data$id, data$value, FUN = function(x)
                                                      x[1] + seq_along(x) - 1)

#   id value new_value desired_value value1
#1:  1    95        96            95     95
#2:  1   100       101           100    100
#3:  1   101       102           101    101
#4:  1   101       102           102    102
#5:  1   101       102           103    103
#6:  2    20        21            20     20
#7:  2    35        36            35     35
#8:  2    38        39            38     38

Upvotes: 5

Related Questions