Reputation: 2289
I have a dataframe with two columns. I want the "id" column to be unique, and the value for the non-repeated id put the same value, and for the repeated id that the value is NA.
library(data.table)
DT <- data.table(id = c(1,2,3,3,4,5,5), value = c(17,13,8,NA,9,NA,11))
DT
id value
1: 1 17
2: 2 13
3: 3 8
4: 3 NA
5: 4 9
6: 5 NA
7: 5 11
Expected output
id value
1: 1 17
2: 2 13
3: 3 NA
4: 4 9
5: 5 NA
Upvotes: 3
Views: 100
Reputation: 39858
I see that you are mainly interested in data.table
solutions, but for completeness, one dplyr
possibility could be:
DT %>%
group_by(id) %>%
slice(which.max(is.na(value)))
id value
<dbl> <dbl>
1 1 17
2 2 13
3 3 NA
4 4 9
5 5 NA
Upvotes: 1
Reputation: 4284
This should do the trick, get the min value by id, if there is a NA, NA will be returned
DT[,.(value=min(value)),.(id)]
library(data.table)
set.seed(1)
DT <- data.table(id = sample(1:1e6,1e8,replace=TRUE),
value = ifelse(runif(1e7) < 0.99,
sample(1:1e6,1e8,replace=TRUE),
NA))
# my solution with min
ptm <- proc.time()
DT[,.(value=min(value)),.(id)]
proc.time() - ptm
# user system elapsed
# 6.34 1.67 2.89
# mt1022's solution
ptm <- proc.time()
DT[, .(value = if(.N == 1) value else NA_real_), by = .(id)]
proc.time() - ptm
# user system elapsed
# 6.61 1.35 4.61
Upvotes: 4
Reputation: 17289
Here is one option:
> DT[, .(value = if(.N == 1) value else NA_real_), by = .(id)]
id value
1: 1 17
2: 2 13
3: 3 NA
4: 4 9
5: 5 NA
Upvotes: 6