Reputation: 854
Just a simple problem that has been giving me some issues. I want to take a 20 character string:
s0<-c("ABCDABCDABCDABCDABCD")
and make a list of strings that each have a unique header and have just one of the letters changed to one of the other three. Ex:
s0
ABCDABCDABCDABCDABCD # original
s1
BBCDABCDABCDABCDABCD # first 'A' to 'B'
s2
CBCDABCDABCDABCDABCD # first 'A' to 'C'
s4
DBCDABCDABCDABCDABCD # first 'A' to 'D'
s5
AACDABCDABCDABCDABCD # second 'B' to 'A'
s6
ACCDABCDABCDABCDABCD # second 'B' to 'C'
s7
ADCDABCDABCDABCDABCD # second 'B' to 'D'
etc...
I want to write it to a .txt file once the list has been generated.
I only want to make one character change per string version, but I would like to have a list of versions that contain all possible combinations (changes at each position).
Sorry if this is a simple problem. I was wondering of there was a way to lapply with gsub, use the stringr package, etc?
Thanks in advance.
Upvotes: 1
Views: 100
Reputation: 1401
Here's another answer which may be a little easier to understand and modify:
s0 <- c("ABCDABCDABCDABCDABCD")
nucleotides <- c( "A", "B", "C", "D" )
sequences <- rep( NA, 1 + 3*nchar( s0 ) ) #pre-allocate space for the results
sequences[1] <- s0
num_found = 1
for( i in 1:nchar( s0 ) )
{
prefix = substring( s0, 1, i - 1 )
old_base = substring( s0, i, i )
sufix = substring( s0, i + 1 )
for( new_base in nucleotides )
{
if( new_base != old_base )
{
num_found = num_found + 1
sequences[num_found] <- paste( prefix, new_base, sufix, sep="" )
}
}
}
print( sequences )
Upvotes: 1
Reputation: 44340
Here's a solution that uses lapply
on the index number to replace:
letters <- c("A", "B", "C", "D")
s0 <- c("ABCDABCDABCDABCDABCD")
combos <- unique(unlist(lapply(1:nchar(s0), function(idx) {
paste0(substr(s0, 1, idx-1), letters, substr(s0, idx+1, nchar(s0)))
})))
combos
# [1] "ABCDABCDABCDABCDABCD" "BBCDABCDABCDABCDABCD" "CBCDABCDABCDABCDABCD"
# [4] "DBCDABCDABCDABCDABCD" "AACDABCDABCDABCDABCD" "ACCDABCDABCDABCDABCD"
# [7] "ADCDABCDABCDABCDABCD" "ABADABCDABCDABCDABCD" "ABBDABCDABCDABCDABCD"
# [10] "ABDDABCDABCDABCDABCD" "ABCAABCDABCDABCDABCD" "ABCBABCDABCDABCDABCD"
# [13] "ABCCABCDABCDABCDABCD" "ABCDBBCDABCDABCDABCD" "ABCDCBCDABCDABCDABCD"
# [16] "ABCDDBCDABCDABCDABCD" "ABCDAACDABCDABCDABCD" "ABCDACCDABCDABCDABCD"
# [19] "ABCDADCDABCDABCDABCD" "ABCDABADABCDABCDABCD" "ABCDABBDABCDABCDABCD"
# [22] "ABCDABDDABCDABCDABCD" "ABCDABCAABCDABCDABCD" "ABCDABCBABCDABCDABCD"
# [25] "ABCDABCCABCDABCDABCD" "ABCDABCDBBCDABCDABCD" "ABCDABCDCBCDABCDABCD"
# [28] "ABCDABCDDBCDABCDABCD" "ABCDABCDAACDABCDABCD" "ABCDABCDACCDABCDABCD"
# [31] "ABCDABCDADCDABCDABCD" "ABCDABCDABADABCDABCD" "ABCDABCDABBDABCDABCD"
# [34] "ABCDABCDABDDABCDABCD" "ABCDABCDABCAABCDABCD" "ABCDABCDABCBABCDABCD"
# [37] "ABCDABCDABCCABCDABCD" "ABCDABCDABCDBBCDABCD" "ABCDABCDABCDCBCDABCD"
# [40] "ABCDABCDABCDDBCDABCD" "ABCDABCDABCDAACDABCD" "ABCDABCDABCDACCDABCD"
# [43] "ABCDABCDABCDADCDABCD" "ABCDABCDABCDABADABCD" "ABCDABCDABCDABBDABCD"
# [46] "ABCDABCDABCDABDDABCD" "ABCDABCDABCDABCAABCD" "ABCDABCDABCDABCBABCD"
# [49] "ABCDABCDABCDABCCABCD" "ABCDABCDABCDABCDBBCD" "ABCDABCDABCDABCDCBCD"
# [52] "ABCDABCDABCDABCDDBCD" "ABCDABCDABCDABCDAACD" "ABCDABCDABCDABCDACCD"
# [55] "ABCDABCDABCDABCDADCD" "ABCDABCDABCDABCDABAD" "ABCDABCDABCDABCDABBD"
# [58] "ABCDABCDABCDABCDABDD" "ABCDABCDABCDABCDABCA" "ABCDABCDABCDABCDABCB"
# [61] "ABCDABCDABCDABCDABCC"
Upvotes: 3