LeGeniusII
LeGeniusII

Reputation: 960

R data.table generate unique values of each column

I have a data.table, and I need to generate another data.table that lists only unique values of each column. An example:

From

> sourceDT <- data.table(ID = c(1,2,3,4), date = c(ymd("20110101"),ymd("20110101"),ymd("20130101"),ymd("20150101")), text = c("A","B","C","C"))
> sourceDT
   ID       date text
1:  1 2011-01-01    A
2:  2 2011-01-01    B
3:  3 2013-01-01    C
4:  4 2015-01-01    C

To

> outputDT <- data.table(ID = c(1,2,3,4),date = c(ymd("20110101"),ymd("20130101"),ymd("20150101"),NA), text = c("A","B","C",NA))
> outputDT
   ID       date text
1:  1 2011-01-01    A
2:  2 2013-01-01    B
3:  3 2015-01-01    C
4:  4       <NA>   NA

What's the most efficient way of achieving this?

Upvotes: 1

Views: 383

Answers (2)

Onyambu
Onyambu

Reputation: 79228

one way you can do: (Though I guess not very Efficient:)

sourceDT[,lapply(.SD, function(x)sort(`is.na<-`(x,duplicated(x)),na.last = T))]
   ID       date text
1:  1 2011-01-01    A
2:  2 2013-01-01    B
3:  3 2015-01-01    C
4:  4       <NA>   NA

Upvotes: 2

zacdav
zacdav

Reputation: 4671

This is what I've understood your question to be, the demo you have doesn't quite match what you explained in response to my comment.

This finds duplicates in each column and sets any to NA, this preserves the data.table size.

library(data.table)
library(lubridate)
sourceDT <- data.table(ID = c(1,2,3,4), date = c(ymd("20110101"),ymd("20110101"),ymd("20130101"),ymd("20150101")), text = c("A","B","C","C"))


for (i in seq_along(sourceDT)) {
  dupes <- which(duplicated(sourceDT[[i]]))
  if (length(dupes > 0)) {
    set(sourceDT, dupes, i, NA)
  }
}

#    ID       date text
# 1:  1 2011-01-01    A
# 2:  2       <NA>    B
# 3:  3 2013-01-01    C
# 4:  4 2015-01-01   NA

Upvotes: 1

Related Questions