Reputation: 1008
I have a large data table (millions of rows), where I need to trim the rows down to one per ID. The rule is that if another art than "X" is in the unique ID, the X'es should be deleted. But if no other art is in the Unique ID, the X should stay.
Test dataset:
dt <- data.table(
ID=c(1,1,1,2,2,3,4,4),
art=c("X", "Y", "X", "X", "X", "X", "Z", "X"),
redskb=c("a", "Y", "a", "b", "b", "c", "k", "n")
)
ID art redskb
1: 1 X a
2: 1 Y Y
3: 1 X a
4: 2 X b
5: 2 X b
6: 3 X c
7: 4 X k
8: 4 Z n
Required output:
ID art redskb
1: 1 Y Y
2: 2 X b
3: 3 X c
4: 4 Z n
I tried with
unique(dt, by = c("ID"))
but could not get it to work efficiently with if's.
Upvotes: 0
Views: 64
Reputation: 47320
data.table:
dt[order(ID,art=="X"),.SD[1],ID]
or @Frank's version:
unique(dt[order(ID,art == "X")], by="ID")
# ID art redskb
# 1: 1 Y Y
# 2: 2 X b
# 3: 3 X c
# 4: 4 Z k
dplyr:
dt %>% group_by(ID) %>% slice(which.max(art != "X"))
# # A tibble: 4 x 3
# # Groups: ID [4]
# ID art redskb
# <dbl> <fctr> <chr>
# 1 1 Y Y
# 2 2 X b
# 3 3 X c
# 4 4 Z k
Upvotes: 2
Reputation: 887148
We can do
dt[dt[, .I[if(uniqueN(art) >1 & any(art == "X")) art!="X" else seq_len(.N)==1], ID]$V1]
# ID art redskb
#1: 1 Y Y
#2: 2 X b
#3: 3 X c
#4: 4 Z k
Upvotes: 0
Reputation: 193527
I'd try something like this:
unique(dt)[, `:=`(flag, if (.N == 1) TRUE else art != "X"), ID][(flag)]
## ID art redskb flag
## 1: 1 Y Y TRUE
## 2: 2 X b TRUE
## 3: 3 X c TRUE
## 4: 4 Z k TRUE
Upvotes: 5