Chris
Chris

Reputation: 2256

Sorting within row (conditional)

I need to sort values for some columns per row, but only if some other reference columns for same row have identical values (skipping NA and NULL). The columns are of different types (to sort are characters, reference are lists).

MWE below. Only row 2 should NOT be sorted. DT.is is the data.table to use, and DT.want is what is needed.

library(data.table)
DT.is <- data.table(Col_A=c("V2", "V2", "V2", "V2"),
                                 Col_B=c("V3", "V3", "V1", "V1"),
                                 Col_C=c("V1", "V1", NA_character_, NA_character_),
                                 Ref_A=list(c("RA", "RB"), "RB", "RA", "RA"),
                                 Ref_B=list(c("RA", "RB"), c("RA", "RB"), "RA", "RA"),
                                 Ref_C=list(c("RA", "RB"), "RA", NULL, NA_character_))

DT.want <- data.table(Col_A=c("V1", "V2", "V1", "V1"),
                                        Col_B=c("V2", "V3", "V2", "V2"),
                                        Col_C=c("V3", "V1", NA_character_, NA_character_),
                                        Ref_A=list(c("RA", "RB"), "RB", "RA", "RA"),
                                        Ref_B=list(c("RA", "RB"), c("RA", "RB"), "RA", "RA"),
                                        Ref_C=list(c("RA", "RB"), "RA", NULL, NA_character_))

The original data.table has many rows.

Upvotes: 1

Views: 75

Answers (1)

moodymudskipper
moodymudskipper

Reputation: 47300

It's been quite a ride :) :

I have no idea how to do it with data.table so here's a tidyverse solution :

library(tidyverse)
DT.is %>%
  rownames_to_column %>%
  gather(key,value,-1) %>%
  separate(key,c("type","V")) %>%
  spread(type,value) %>%
  mutate(Ref2=as.character(Ref),
         Col = unlist(Col)) %>%
  group_by(rowname,Ref2) %>%
  mutate(Col = sort(Col,na.last=TRUE)) %>%
  ungroup %>%
  select(-Ref2) %>%
  gather(type,value,Col,Ref) %>%
  unite(key,type,V) %>%
  spread(key,value) %>%
  as.data.table

#    rowname Col_A Col_B Col_C Ref_A Ref_B Ref_C
# 1:       1    V1    V2    V3 RA,RB RA,RB RA,RB
# 2:       2    V2    V3    V1    RB RA,RB    RA
# 3:       3    V1    V2    NA    RA    RA  NULL
# 4:       4    V1    V2    NA    RA    RA    NA

I'm first converting to long format because this way i'll be able to group by your Ref values. Then I sort by initial row id and Ref value (converted to character temporarily as one can't group by list) and then I climb all the way abck to initial format.

Upvotes: 1

Related Questions