Reputation: 33
The problem I am currently working on gives you a DNA string and a substring. You need to have your code output the start location of each instance of the substring in the DNA string. For ex; given DNA string = "GATATATGCATATACTT" substring = "ATAT" output should be: 2 4 10 this ex is what I am using to test my code
Link to rosalind problem if you need: https://rosalind.info/problems/subs/
The code I currently have almost works correctly:
# Finding a Motif in DNA
input <- readLines("input.txt", warn = FALSE) # Load data
DNAseq1 <- input[1] # line 1 of data file is DNA string
substring1 <- input[2] # line 2 of data file is the substring being searched for
print(DNAseq1) #
print(substring1) # Test commands to make sure the lines are being assigned to the variables correctly
print(unlist(gregexpr(substring1, DNAseq1)))
The output it returns is:
[1] 2 10
However, it should output:
[1] 2 4 10
The DNA string is "GATATATGCATATACTT" and the substring is "ATAT", I believe it's not picking up location 4 (2nd instance of substring) due to it being in the middle of the first instance of the substring. I'm not sure how to fix this, please help!
Upvotes: 1
Views: 32