tomathon
tomathon

Reputation: 854

Replacing one character once in string for an n-length string in R

Just a simple problem that has been giving me some issues. I want to take a 20 character string:

s0<-c("ABCDABCDABCDABCDABCD")

and make a list of strings that each have a unique header and have just one of the letters changed to one of the other three. Ex:

s0
ABCDABCDABCDABCDABCD  # original
s1
BBCDABCDABCDABCDABCD  # first 'A' to 'B'
s2
CBCDABCDABCDABCDABCD  # first 'A' to 'C'
s4
DBCDABCDABCDABCDABCD  # first 'A' to 'D'
s5
AACDABCDABCDABCDABCD  # second 'B' to 'A'
s6
ACCDABCDABCDABCDABCD  # second 'B' to 'C'
s7
ADCDABCDABCDABCDABCD  # second 'B' to 'D'

etc...

I want to write it to a .txt file once the list has been generated.

I only want to make one character change per string version, but I would like to have a list of versions that contain all possible combinations (changes at each position).

Sorry if this is a simple problem. I was wondering of there was a way to lapply with gsub, use the stringr package, etc?

Thanks in advance.

Upvotes: 1

Views: 100

Answers (2)

andypea
andypea

Reputation: 1401

Here's another answer which may be a little easier to understand and modify:

s0 <- c("ABCDABCDABCDABCDABCD")
nucleotides <- c( "A", "B", "C", "D" )

sequences <- rep( NA, 1 + 3*nchar( s0 ) ) #pre-allocate space for the results

sequences[1] <- s0
num_found = 1

for( i in 1:nchar( s0 ) )
{
  prefix = substring( s0, 1, i - 1 )
  old_base = substring( s0, i, i )
  sufix = substring( s0, i + 1 )

  for( new_base in nucleotides )
  {
    if( new_base != old_base )
    {
      num_found = num_found + 1
      sequences[num_found] <- paste( prefix, new_base, sufix, sep="" )
    }
  }
}

print( sequences )

Upvotes: 1

josliber
josliber

Reputation: 44340

Here's a solution that uses lapply on the index number to replace:

letters <- c("A", "B", "C", "D")
s0 <- c("ABCDABCDABCDABCDABCD")
combos <- unique(unlist(lapply(1:nchar(s0), function(idx) {
  paste0(substr(s0, 1, idx-1), letters, substr(s0, idx+1, nchar(s0)))
})))
combos
#  [1] "ABCDABCDABCDABCDABCD" "BBCDABCDABCDABCDABCD" "CBCDABCDABCDABCDABCD"
#  [4] "DBCDABCDABCDABCDABCD" "AACDABCDABCDABCDABCD" "ACCDABCDABCDABCDABCD"
#  [7] "ADCDABCDABCDABCDABCD" "ABADABCDABCDABCDABCD" "ABBDABCDABCDABCDABCD"
# [10] "ABDDABCDABCDABCDABCD" "ABCAABCDABCDABCDABCD" "ABCBABCDABCDABCDABCD"
# [13] "ABCCABCDABCDABCDABCD" "ABCDBBCDABCDABCDABCD" "ABCDCBCDABCDABCDABCD"
# [16] "ABCDDBCDABCDABCDABCD" "ABCDAACDABCDABCDABCD" "ABCDACCDABCDABCDABCD"
# [19] "ABCDADCDABCDABCDABCD" "ABCDABADABCDABCDABCD" "ABCDABBDABCDABCDABCD"
# [22] "ABCDABDDABCDABCDABCD" "ABCDABCAABCDABCDABCD" "ABCDABCBABCDABCDABCD"
# [25] "ABCDABCCABCDABCDABCD" "ABCDABCDBBCDABCDABCD" "ABCDABCDCBCDABCDABCD"
# [28] "ABCDABCDDBCDABCDABCD" "ABCDABCDAACDABCDABCD" "ABCDABCDACCDABCDABCD"
# [31] "ABCDABCDADCDABCDABCD" "ABCDABCDABADABCDABCD" "ABCDABCDABBDABCDABCD"
# [34] "ABCDABCDABDDABCDABCD" "ABCDABCDABCAABCDABCD" "ABCDABCDABCBABCDABCD"
# [37] "ABCDABCDABCCABCDABCD" "ABCDABCDABCDBBCDABCD" "ABCDABCDABCDCBCDABCD"
# [40] "ABCDABCDABCDDBCDABCD" "ABCDABCDABCDAACDABCD" "ABCDABCDABCDACCDABCD"
# [43] "ABCDABCDABCDADCDABCD" "ABCDABCDABCDABADABCD" "ABCDABCDABCDABBDABCD"
# [46] "ABCDABCDABCDABDDABCD" "ABCDABCDABCDABCAABCD" "ABCDABCDABCDABCBABCD"
# [49] "ABCDABCDABCDABCCABCD" "ABCDABCDABCDABCDBBCD" "ABCDABCDABCDABCDCBCD"
# [52] "ABCDABCDABCDABCDDBCD" "ABCDABCDABCDABCDAACD" "ABCDABCDABCDABCDACCD"
# [55] "ABCDABCDABCDABCDADCD" "ABCDABCDABCDABCDABAD" "ABCDABCDABCDABCDABBD"
# [58] "ABCDABCDABCDABCDABDD" "ABCDABCDABCDABCDABCA" "ABCDABCDABCDABCDABCB"
# [61] "ABCDABCDABCDABCDABCC"

Upvotes: 3

Related Questions