Reputation: 1523
I have a large matrix of characters and I want to convert it to a matrix of strings, but without looping over each row individually, so I was wondering is there a smart way to do it fast, I tried with paste(data[,4:((i*2)+3)],collapse=""), however my problem is that it combines all the rows into a very large one string, while I need to have the same initial number of rows as the original matrix, and each row contains one column which is the string that contains the characters in this specific row in other words: I want to convert the matrix
a=
{
D E R P G K I
S K P A S L N
S K P A S L N
S K P A S L N
S K P A S L N
}
into
a=
{
DERPGKI
SKPASLN
SKPASLN
SKPASLN
SKPASLN
}
Upvotes: 1
Views: 7194
Reputation: 193517
apply
is a loop, but it should still be pretty efficient in this case. It's use would be:
apply(x, 1, paste, collapse = "")
Alternatively, you can try:
do.call(paste0, data.frame(x))
which might actually be faster....
A reproducible example (not sure why I'm wasting my time here)...
x <- structure(c("D", "S", "S", "S", "S", "E", "K", "K", "K", "K",
"R", "P", "P", "P", "P", "P", "A", "A", "A", "A",
"G", "S", "S", "S", "S", "K", "L", "L", "L", "L",
"I", "N", "N", "N", "N"), .Dim = c(5L, 7L))
x
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "D" "E" "R" "P" "G" "K" "I"
# [2,] "S" "K" "P" "A" "S" "L" "N"
# [3,] "S" "K" "P" "A" "S" "L" "N"
# [4,] "S" "K" "P" "A" "S" "L" "N"
# [5,] "S" "K" "P" "A" "S" "L" "N"
Let's compare the options:
library(microbenchmark)
fun1 <- function(inmat) apply(inmat, 1, paste, collapse = "")
fun2 <- function(inmat) do.call(paste0, data.frame(inmat))
fun1(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"
fun2(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"
microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
# expr min lq median uq max neval
# fun1(x) 97.634 104.4805 112.0725 117.7735 268.503 100
# fun2(x) 1258.000 1282.6275 1301.5555 1316.5015 1576.506 100
And, on longer data.
X <- do.call(rbind, replicate(100000, x, simplify=FALSE))
dim(X)
# [1] 500000 7
microbenchmark(fun1(X), fun2(X), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1(X) 4189.8940 4226.9354 4382.0403 4570.032 4596.983 10
# fun2(X) 825.9816 835.4351 888.5102 1031.509 1056.832 10
I suspect that on wider data, apply
would still be more efficient though.
Upvotes: 5