TheRimalaya
TheRimalaya

Reputation: 4592

Sort character in vector of string in R

I have data like,

df <- structure(list(Sex = structure(c(1L, 1L, 2L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), 
    Age = c(19L, 16L, 16L, 13L, 16L, 30L, 16L, 30L, 16L, 30L, 
    30L, 16L, 19L, 1L, 30L), I = c(1, 1, 0, 0, 1, 0, 1, 0, 1, 
    0, 0, 0, 1, 0, 1), E = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 
    1, 0, 1, 0), S = c(1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 
    0, 1), N = c(0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0), 
    F = c(1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1), T = c(0, 
    1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0), C = c(1, 1, 1, 
    0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1), D = c(0, 0, 0, 1, 0, 
    1, 0, 1, 0, 1, 1, 1, 1, 0, 0), type = c("CIFS", "CITN", "CESF", 
    "DEFS", "CIFN", "DETS", "CITS", "DEFS", "CIFN", "DEFN", "DETS", 
    "DETS", "DINF", "CENT", "CIFS"), PO = runif(15, -3, 3), AO = runif(15, -3, 3)), .Names = c("Sex", 
"Age", "I", "E", "S", "N", "F", "T", "C", "D", "type", "PO", 
"AO"), class = c("tbl_dt", "tbl", "data.table", "data.frame"), row.names = c(NA, 
-15L))

I want to sort the column type. Not the column but the characters in it. And get the same structure afterwards. For example, CIFS should then be CFIS. I tried to do it as,

df <- within(df, {
    type <- apply(sapply(strsplit(df[, type], split=''), sort), 2, 
        function(x) paste0(x, collapse = ''))
})

Is there any simpler solution, that I have missed to find.

Upvotes: 0

Views: 651

Answers (2)

digEmAll
digEmAll

Reputation: 57220

This should work for both data.frame and data.table (base R only):

df$type <- vapply(strsplit(df$type, split=''),FUN=function(x)paste(sort(x),collapse=''),"")

Result:

> df
   Sex Age I E S N F T C D type         PO         AO
1    F  19 1 0 1 0 1 0 1 0 CFIS  2.9750666  2.0308410
2    F  16 1 0 0 1 0 1 1 0 CINT  0.7902187  2.0891158
3    M  16 0 1 1 0 1 0 1 0 CEFS -1.7173785  2.4774140
4    F  13 0 1 1 0 1 0 0 1 DEFS  1.5352127 -1.9272470
5    M  16 1 0 0 1 1 0 1 0 CFIN -0.2160741  1.7359897
6    M  30 0 1 1 0 0 1 0 1 DEST  2.6314981 -0.6252466
7    F  16 1 0 1 0 0 1 1 0 CIST -1.6032894 -1.9938226
8    M  30 0 1 1 0 1 0 0 1 DEFS  0.7748583 -2.0935737
9    F  16 1 0 0 1 1 0 1 0 CFIN -2.9368356  0.3363364
10   F  30 0 1 0 1 1 0 0 1 DEFN -0.6506217  2.6681535
11   F  30 0 1 1 0 0 1 0 1 DEST -0.4432578  0.4627441
12   F  16 0 1 1 0 0 1 0 1 DEST  2.0236760  2.7684298
13   F  19 1 0 0 1 1 0 0 1 DFIN -1.1774931  2.6546726
14   F   1 0 1 0 1 0 1 1 0 CENT -2.2365388  2.7902646
15   F  30 1 0 1 0 1 0 1 0 CFIS -1.6139238 -2.4982620

Upvotes: 3

RInatM
RInatM

Reputation: 1208

Since you are using data.table, I would suggest

df[, type := paste(sort(unlist(strsplit(type, ""))), collapse = ""), by = type]

like described in How to sort letters in a string?

Upvotes: 4

Related Questions