DOSMarter
DOSMarter

Reputation: 1523

convert a matrix of characters into a matrix of strings in R

I have a large matrix of characters and I want to convert it to a matrix of strings, but without looping over each row individually, so I was wondering is there a smart way to do it fast, I tried with paste(data[,4:((i*2)+3)],collapse=""), however my problem is that it combines all the rows into a very large one string, while I need to have the same initial number of rows as the original matrix, and each row contains one column which is the string that contains the characters in this specific row in other words: I want to convert the matrix

a=
{
D  E  R  P  G  K  I
S  K  P  A  S  L  N
S  K  P  A  S  L  N
S  K  P  A  S  L  N
S  K  P  A  S  L  N
}

into

a=
{
 DERPGKI
 SKPASLN
 SKPASLN
 SKPASLN
 SKPASLN
}

Upvotes: 1

Views: 7194

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

apply is a loop, but it should still be pretty efficient in this case. It's use would be:

apply(x, 1, paste, collapse = "")

Alternatively, you can try:

do.call(paste0, data.frame(x))

which might actually be faster....


A reproducible example (not sure why I'm wasting my time here)...

x <- structure(c("D", "S", "S", "S", "S", "E", "K", "K", "K", "K", 
                 "R", "P", "P", "P", "P", "P", "A", "A", "A", "A", 
                 "G", "S", "S", "S", "S", "K", "L", "L", "L", "L", 
                 "I", "N", "N", "N", "N"), .Dim = c(5L, 7L))
x
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "D"  "E"  "R"  "P"  "G"  "K"  "I" 
# [2,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [3,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [4,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [5,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 

Let's compare the options:

library(microbenchmark)

fun1 <- function(inmat) apply(inmat, 1, paste, collapse = "")
fun2 <- function(inmat) do.call(paste0, data.frame(inmat))

fun1(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"
fun2(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"

microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
#     expr      min        lq    median        uq      max neval
#  fun1(x)   97.634  104.4805  112.0725  117.7735  268.503   100
#  fun2(x) 1258.000 1282.6275 1301.5555 1316.5015 1576.506   100

And, on longer data.

X <- do.call(rbind, replicate(100000, x, simplify=FALSE))
dim(X)
# [1] 500000      7

microbenchmark(fun1(X), fun2(X), times = 10)
# Unit: milliseconds
#     expr       min        lq    median       uq      max neval
#  fun1(X) 4189.8940 4226.9354 4382.0403 4570.032 4596.983    10
#  fun2(X)  825.9816  835.4351  888.5102 1031.509 1056.832    10

I suspect that on wider data, apply would still be more efficient though.

Upvotes: 5

Related Questions