Jeppe Olsen
Jeppe Olsen

Reputation: 1008

Conditional sorting of a data.table

I have very large data.table that I want to trim down in this fashion:

  1. Only one unique id

  2. If there is any other data than "X" in the same log, that other should stay

  3. If only X, then the first X should stay

  4. If there is more than one other than "X", then all those should stay, separated by commas, but not the "X".

Sample dataset:

library(data.table)
dt <- data.table(
    id=c(1,1,2,3,3,4,4,4,5,5),
    log=c(11,11,11,12,12,12,12,12,13,13),
    art=c("X", "Y", "X", "X", "X", "Z", "X", "Y","X", "X")
  )
dt
id log art
1:  1  11   X
2:  1  11   Y
3:  2  11   X
4:  3  12   X
5:  3  12   X
6:  4  12   Z
7:  4  12   X
8:  4  12   Y
9:  5  13   X
10: 5  13   X

Required output:

 id    log  art    
  1     11   Y    
  2     11   Y
  3     12   Z,Y
  4     12   Z,Y
  5     13   X

Upvotes: 1

Views: 366

Answers (3)

lmo
lmo

Reputation: 38500

Here is one method, though there maybe a more efficient approach.

unique(dt[,.(id, log)])[dt[, .(art=if(.N == 1 | all(art == "X"))
                                      art[1] else toString(unique(art[art != "X"]))),
                           by=log], on="log"]

which returns

   id log  art
1:  1  11    Y
2:  2  11    Y
3:  3  12 Z, Y
4:  4  12 Z, Y
5:  5  13    X

perform a left join of the desired values of art by each log onto the unique pairs of ID and log. This assumes that no ID spans two logs, which is the case in the example.

Upvotes: 2

M--
M--

Reputation: 28825

Just wanted to try this with dplyr:

library(data.table)
library(dplyr)
   dat <- setDT(dt %>% group_by(id) %>% 
                       unique() %>% 
                       summarise(bldlog = mean(log),
                                 art = gsub("X,|,X", "",paste(art, collapse = ","))))


dat

#    id bldlog art 
# 1:  1     11   Y 
# 2:  2     11   X 
# 3:  3     12   X 
# 4:  4     12 Z,Y 
# 5:  5     13   X

Upvotes: 0

akrun
akrun

Reputation: 886948

We can try

dt[,  .(art = if(all(art=="X")) "X" else 
     toString(unique(art[art != "X"]))), .(id, logbld = log)]
#    id logbld  art
#1:  1     11    Y
#2:  2     11    X
#3:  3     12    X
#4:  4     12 Z, Y
#5:  5     13    X

Upvotes: 2

Related Questions