newbie
newbie

Reputation: 907

How to remove all duplicated rows in data.table in r

Let's say we have

library(data.table)    
dt <- data.table(Date = c(201405,201405,201504,201505, 201505,201505), ID = c(500,500,600,700,500, 700), INC = c(20,30,50,75,80,90))

return,

     Date  ID INC
1: 201405 500  20
2: 201405 500  30
3: 201504 600  50
4: 201505 700  75
5: 201505 500  80
6: 201505 700  90

I want to remove all IDs that are in the same Date. The return should be

     Date  ID INC
1: 201504 600  50
2: 201505 500  80

Could you please suggest?

Upvotes: 5

Views: 2179

Answers (1)

akrun
akrun

Reputation: 887971

We group by 'ID', get a logical index with duplicated on the 'Date', and negate so that all the unique elements are now TRUE, use .I to get the row index, extract the index column 'V1' and use that to subset 'dt'.

dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
#      Date  ID INC
#1: 201505 500  80
#2: 201504 600  50

Or another option would be to group by 'Date', 'ID' and if the nrow is equal to 1 (.N==1), we get the Subset of Data.table (.SD).

dt[, if(.N==1) .SD, .(Date, ID)]
#     Date  ID INC
#1: 201504 600  50
#2: 201505 500  80

Or as @Frank mentioned, we can use a data.table/base R combo

DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]

Upvotes: 7

Related Questions