Reputation: 13
So I am extremely new to programming and I am not very savvy with any programming language. I purchased a book on programming for biologists and I have fumbled through a few things. I want to: get sequences from a file and find and extract a variable region from it. my code below:
**
#!/usr/bin/python
#for extracting GAA sequences
import os
import sys
import re
#opens sequence file and defines it as reps
reps = open('142sequences.txt')
#defining what to read
line = reps.readlines()
#defines what we are looking for in rep lines
for line in reps:
sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", reps)
if sear:
repeats = sear.group()
print(repeats)
else:
print('Not Recognized')
** I get nothing in return. Please help
Upvotes: 1
Views: 61
Reputation: 180401
You need to search each line not reps which is a list of all the lines:
with open('142sequences.txt') as reps:
# iterate over each line in the file
for line in reps:
# pass each line to re.search
sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", line)
if sear:
repeats = sear.group()
print(repeats)
else:
print('Not Recognized')
Calling readlines reads all the lines into a list so you actually never loop in your own code as you would have consumed the iterator with the initial readlines call, if you had looped it would have caused an error as you have to pass a string not a list to search.
Upvotes: 1