Reputation: 127
I am trying to create a matrix of character strings of varying length.
So far, I haven't been able to access the elements in the matrix properly in order apply them to a new one.
ranNumsVec <- runif(1000, min = 100, max = 1000)
ranNumsVec <- round(ranNumsVec, digits = 0)
clusterSeqLengths <- matrix(data = ranNumsVec, nrow = 10, ncol = 100,
byrow = FALSE, dimnames = NULL)
clusterSeqs <- matrix(data = NA, nrow = 10, ncol = 100, byrow = FALSE, dimnames = NULL)
^ These are fine
With these functions, I am trying to apply characters with certain probabilities to a separate matrix. (a matrix of strings), such that each string within the matrix is determined by one of the random numbers stored in the random Nums Vec
above. In the end, I am looking to create a matrix of 1000 sequences of ATGC's of length 100 to 1000 as indicated above.
lengthSmallString <- function(clusterSeqLengths)
clusterSeqs <- paste(sample("A", "C", "G", "T"), c clusterSeqLengths,replace=TRUE ,prob=c(0.2, 0.55, 0.1, . .15))
fillCharsToLength <- function(clusterSeqs)
clusterSeqs <- apply(clusterSeqs, 2, lengthSmallString, simplify = TRUE, USE.NAMES
= FALSE)
I am not entirely sure how to properly iterate through the matrix and apply the paste function to a string of a certain length. I tried a for loop, but it didn't get me very far
for(i=1:nume1(array) in clusterVectorNums)
{
for(j in clusterVectorNums)
{
seqLength <- ranNumsVec[i,j]
clusterSeqs[i,j] <- paste(sample(c("A", "C", "G", "T"),
seqLength, replace=TRUE ,prob=c(0.2, 0.55, 0.1, 0.15)),
collapse="")
}
}
Upvotes: 0
Views: 53
Reputation: 1549
If I understand your problem correctly, if you have a 5 in clusterSeqLengths[1,1]
you are expecting a sequence of randomly sampled values c("A","C","G","T")
of length 5 as a single string in your final output clusterSeqs[1,1]
and you would like to repeat this process for every cell in clusterSeqLengths
. On the assumption this is the case you could do this using apply
.
I have modified your presented example such that the numbers and size of the problem are smaller to show results in my post.
set.seed(1) # initiliase RNG seed for reproducible results
ranNumsVec <- runif(10, min = 0, max = 5)
ranNumsVec <- round(ranNumsVec, digits = 0)
clusterSeqLengths <- matrix(data = ranNumsVec, nrow = 5, ncol = 2,
byrow = FALSE, dimnames = NULL)
# first make a function which takes an n for
# how long the sequence should be and returns the
# relevant sequence
f = function(n){
paste(
sample(c("A", "C", "G", "T"),
n, replace=TRUE ,prob=c(0.2, 0.55, 0.1, 0.15)
),
collapse="")
}
clusterSeqLengths
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 3
## [4,] 5 3
## [5,] 1 0
# check it works on one value
f(clusterSeqLengths[1,1])
## [1] "C"
Then use apply
with index = c(1,2)
to apply the function f
to each cell
(clusterSeq = apply(clusterSeqLengths,c(1,2),f))
## [,1] [,2]
## [1,] "C" "CCCC"
## [2,] "AC" "CTCCA"
## [3,] "TCA" "CCT"
## [4,] "GCTGC" "ATC"
## [5,] "A" ""
Upvotes: 0