Jeppe Olsen
Jeppe Olsen

Reputation: 1008

unique rows in data frame based on if's

I have a large data table (millions of rows), where I need to trim the rows down to one per ID. The rule is that if another art than "X" is in the unique ID, the X'es should be deleted. But if no other art is in the Unique ID, the X should stay.

Test dataset:

dt <- data.table(
  ID=c(1,1,1,2,2,3,4,4),
  art=c("X", "Y", "X", "X", "X", "X", "Z", "X"),
  redskb=c("a", "Y", "a", "b", "b", "c", "k", "n")
)

   ID art redskb
1:  1   X      a
2:  1   Y      Y
3:  1   X      a
4:  2   X      b
5:  2   X      b
6:  3   X      c
7:  4   X      k
8:  4   Z      n

Required output:

    ID art redskb
1:  1   Y      Y   
2:  2   X      b
3:  3   X      c
4:  4   Z      n

I tried with

unique(dt, by = c("ID"))

but could not get it to work efficiently with if's.

Upvotes: 0

Views: 64

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47320

data.table:

dt[order(ID,art=="X"),.SD[1],ID]

or @Frank's version:

unique(dt[order(ID,art == "X")], by="ID")

#    ID art redskb
# 1:  1   Y      Y
# 2:  2   X      b
# 3:  3   X      c
# 4:  4   Z      k

dplyr:

dt %>% group_by(ID) %>% slice(which.max(art != "X"))
#   # A tibble: 4 x 3
#   # Groups:   ID [4]
#        ID    art redskb
#     <dbl> <fctr>  <chr>
#   1     1      Y      Y
#   2     2      X      b
#   3     3      X      c
#   4     4      Z      k

Upvotes: 2

akrun
akrun

Reputation: 887148

We can do

dt[dt[, .I[if(uniqueN(art) >1 & any(art == "X")) art!="X" else  seq_len(.N)==1], ID]$V1]
#   ID art redskb
#1:  1   Y      Y
#2:  2   X      b
#3:  3   X      c
#4:  4   Z      k

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

I'd try something like this:

unique(dt)[, `:=`(flag, if (.N == 1) TRUE else art != "X"), ID][(flag)]
##    ID art redskb flag
## 1:  1   Y      Y TRUE
## 2:  2   X      b TRUE
## 3:  3   X      c TRUE
## 4:  4   Z      k TRUE

Upvotes: 5

Related Questions