Reputation: 51
I am looking for a way to match the beginning of a line to a regex and for the line to be returned afterwards. The set is quite extensive hence why I cannot simply use the method given on Python regular expressions matching within set. I was also wondering if regex is the best solution. I have read the http://docs.python.org/3.3/library/re.html alas, it does not seem to hold the answer. Here is what I have tried so far...
import re
import os
import itertools
f2 = open(file_path)
unilist = []
bases=['A','G','C','N','U']
patterns= set(''.join(per) for per in itertools.product(bases, repeat=5))
#stuff
if re.match(r'.*?(?:patterns)', line):
print(line)
unilist.append(next(f2).strip())
print (unilist)
You see, the problem is that I do not know how to refer to my set...
The file I am trying to match it to looks like:
@SRR566546.970 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50 TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT
+
hhhhhhhhhhghhghhhhhfhhhhhfffffeee[X]b[d[ed`[Y[^Y
Upvotes: 0
Views: 2476
Reputation: 27585
According to what I've understood from your question, it seems to me that this could fit your need:
import re
sss = '''dfgsdfAUGNA321354354
!=**$=)"nNNUUG54788
=AkjhhUUNGffdffAAGjhff1245GGAUjkjdUU
.....cv GAUNAANNUGGA'''
print re.findall('^(.+?[AGCNU]{5})',sss,re.MULTILINE)
Upvotes: 0
Reputation: 1123410
You are going about it the wrong way.
You simply leave the set of characters to the regular expression:
re.search('[AGCNU]{5}', line)
matches any 5-character pattern built from those 5 characters; that matches the same 3125 different combinations you generated with your set line, but doesn't need to build all possible combinations up front.
Otherwise, your regular expression attempt had no correlation to your patterns
variable, the pattern r'.*?(?:patterns)'
would match 0 or more arbitrary characters, followed by the literal text 'patterns'
.
Upvotes: 2