Reputation: 137
I am trying to find a proper way, in R, to find duplicated values, and add the value 1 to each subsequent duplicated value grouped by id. For example:
data = data.table(id = c('1','1','1','1','1','2','2','2'),
value = c(95,100,101,101,101,20,35,38))
data$new_value <- ifelse(data[ , data$value] == lag(data$value,1),
lag(data$value, 1) + 1 ,data$value)
data$desired_value <- c(95,100,101,102,103,20,35,38)
Produces:
id value new_value desired_value
1: 1 95 NA 95
2: 1 100 100 100
3: 1 101 101 101 # first 101 in id 1: add 0
4: 1 101 102 102 # second 101 in id 1: add 1
5: 1 101 102 103 # third 101 in id 1: add 2
6: 2 20 20 20
7: 2 35 35 35
8: 2 38 38 38
I tried doing this with ifelse
, but it doesn't work recursively so it only applies to the following row, and not any subsequent rows. Also the lag
function results in me losing the first value in value
.
I've seen examples with character variables with make.names
or make.unique
, but haven't been able to find a solution for a duplicated numeric value.
Background: I am doing a survival analysis and I am finding that with my data there are stop times that are the same, so I need to make it unique by adding a 1 (stop times are in seconds).
Upvotes: 6
Views: 2450
Reputation: 67778
To avoid (a potentially costly) by
, you may use rowid
:
data[, res := value + rowid(id, value) - 1]
# data
# id value new_value desired_value res
# 1: 1 95 96 95 95
# 2: 1 100 101 100 100
# 3: 1 101 102 101 101
# 4: 1 101 102 102 102
# 5: 1 101 102 103 103
# 6: 2 20 21 20 20
# 7: 2 35 36 35 35
# 8: 2 38 39 38 38
Upvotes: 1
Reputation: 93813
Here's an attempt. You're essentially grouping by id
and value
and adding 0:(length(value)-1)
. So:
data[, onemore := value + (0:(.N-1)), by=.(id, value)]
# id value new_value desired_value onemore
#1: 1 95 96 95 95
#2: 1 100 101 100 100
#3: 1 101 102 101 101
#4: 1 101 102 102 102
#5: 1 101 102 103 103
#6: 2 20 21 20 20
#7: 2 35 36 35 35
#8: 2 38 39 38 38
Upvotes: 6
Reputation: 887128
Here is one option with tidyverse
library(dplyr)
data %>%
group_by(id, value) %>%
mutate(onemore = value + row_number()-1)
# id value onemore
# <chr> <dbl> <dbl>
#1 1 95 95
#2 1 100 100
#3 1 101 101
#4 1 101 102
#5 1 101 103
#6 2 20 20
#7 2 35 35
#8 2 38 38
Or we can use base R
without anonymous function call
data$onemore <- with(data, value + ave(value, id, value, FUN =seq_along)-1)
data$onemore
#[1] 95 100 101 102 103 20 35 38
Upvotes: 5
Reputation: 388982
With base R we can use ave
where we take the first value of each group and basically add the row number of that row in that group.
data$value1 <- ave(data$value, data$id, data$value, FUN = function(x)
x[1] + seq_along(x) - 1)
# id value new_value desired_value value1
#1: 1 95 96 95 95
#2: 1 100 101 100 100
#3: 1 101 102 101 101
#4: 1 101 102 102 102
#5: 1 101 102 103 103
#6: 2 20 21 20 20
#7: 2 35 36 35 35
#8: 2 38 39 38 38
Upvotes: 5