Mell
Mell

Reputation: 33

Solving the Rosalind challenge "Finding a Motif in DNA"

The problem I am currently working on gives you a DNA string and a substring. You need to have your code output the start location of each instance of the substring in the DNA string. For ex; given DNA string = "GATATATGCATATACTT" substring = "ATAT" output should be: 2 4 10 this ex is what I am using to test my code

Link to rosalind problem if you need: https://rosalind.info/problems/subs/

The code I currently have almost works correctly:

# Finding a Motif in DNA

input <- readLines("input.txt", warn = FALSE) # Load data

DNAseq1 <- input[1] # line 1 of data file is DNA string
substring1 <- input[2] # line 2 of data file is the substring being searched for

print(DNAseq1)     #
print(substring1)  # Test commands to make sure the lines are being assigned to the variables correctly

print(unlist(gregexpr(substring1, DNAseq1)))

The output it returns is:

[1] 2 10

However, it should output:

[1] 2 4 10

The DNA string is "GATATATGCATATACTT" and the substring is "ATAT", I believe it's not picking up location 4 (2nd instance of substring) due to it being in the middle of the first instance of the substring. I'm not sure how to fix this, please help!

Upvotes: 1

Views: 32

Answers (0)

Related Questions