Reputation: 631
Suppose I have the following data.table:
dt <- data.table(a = 1:2, b = 1:2, c = c(1, 1))
# dt
# a b c
# 1: 1 1 1
# 2: 2 2 1
What would be the fastest way to create a fourth column d
indicating that the preexisting values in each row are all identical, so that the resulting data.table will look like the following?
# dt
# a b c d
# 1: 1 1 1 identical
# 2: 2 2 1 not_identical
I want to avoid using duplicated
function and want to stick to using identical
or a similar function even if it means iterating through items within each row.
Upvotes: 2
Views: 652
Reputation: 101044
Here is another data.table
option using var
dt[, d := ifelse(var(unlist(.SD)) == 0, "identical", "non_identical"), seq(nrow(dt))]
which gives
> dt
a b c d
1: 1 1 1 identical
2: 2 2 1 non_identical
Upvotes: 2
Reputation: 886938
uniqueN
can be applied grouped by row and create a logical expression (== 1
)
library(data.table)
dt[, d := c("not_identical", "identical")[(uniqueN(unlist(.SD)) == 1) +
1], 1:nrow(dt)]
-output
dt
# a b c d
#1: 1 1 1 identical
#2: 2 2 1 not_identical
Or another efficient approach might be to do comparison with the first column, and create an expression with rowSums
dt[, d := c("identical", "not_identical")[1 + rowSums(.SD[[1]] != .SD) > 0 ] ]
Upvotes: 6