Reputation: 1440
I have the following vector v
:
c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
i'm facing a very upsetting issue here. Each element of this vector is a DNA sequence. What i want to do is split each element 2 letters by 2 and obtain the count of occurrences of each pair of letters. The desired output would be exactly this for the first element:
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
This result is achieved easily using the function oligonucleotideFrequency. The problem is that this function will not apply over a list or a vector using sapply or lapply and i don't understand where is the problem and how to fix it.
If i do:
oligonucleotideFrequency(DNAString(v[1]), width = 2)
It works and i get this output:
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
but if i do:
v <- DNAString(v)
lapply(v, oligonucleotideFrequency(v, width = 2)
This is what i get:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘oligonucleotideFrequency’ for signature ‘"list"
Same occurs with sapply
.
If i check the class of v
after applying the DNAString
function it returns "list"
so idon't get where is the problem here.
Even if i do:
oligonucleotideFrequency(v[1], width = 2)
it returns:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘oligonucleotideFrequency’ for signature ‘"list"’
I'm totally lost here, please help, i've been hours breaking my head into this, how can i fis this problem?? I want to apply this function to the whole vector at once.
PD: The R package containing this functions os called Biostrings
and it can be downloaded and installed from here
Thanks in advance
Upvotes: 2
Views: 483
Reputation: 32558
x = c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
nc = c("a", "c", "t", "g")
lv = sort(Reduce(paste0, expand.grid(replicate(2, nc, simplify = FALSE))))
lapply(x, function(s)
table(factor(sapply(seq(2, nchar(s), 1), function(i)
substring(s, i - 1, i)),
levels = lv)))
#[[1]]
#aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
# 3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
#[[2]]
#aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
# 3 4 1 4 5 2 4 4 2 4 1 5 3 5 6 3
#[[3]]
#aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
# 2 4 4 4 3 3 2 4 2 4 1 3 7 1 3 9
Upvotes: 1
Reputation: 39174
There are two ways to use the lapply
function.
The first one is to provide a user-defined function and set all the arguments inside the function like the following.
library(Biostrings)
v <- c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
lapply(v, function(x) oligonucleotideFrequency(DNAString(x), width = 2))
# [[1]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
#
# [[2]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 4 1 4 5 2 4 4 2 4 1 5 3 5 6 3
#
# [[3]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 2 4 4 4 3 3 2 4 2 4 1 3 7 1 3 9
The second one is to provide the function name, and provide the arguemnts like ...
as follows. For this option, the item in the list (in this case, v
), automatically goes to the first argument of the fucntion.
library(Biostrings)
v <- c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt",
"tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa",
"gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg")
v <- lapply(v, DNAString)
lapply(v, oligonucleotideFrequency, width = 2)
# [[1]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 2 2 4 1 0 6 3 0 6 4 7 7 2 5 4
#
# [[2]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 3 4 1 4 5 2 4 4 2 4 1 5 3 5 6 3
#
# [[3]]
# AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
# 2 4 4 4 3 3 2 4 2 4 1 3 7 1 3 9
Upvotes: 1