Matching regex to set

Question

I am looking for a way to match the beginning of a line to a regex and for the line to be returned afterwards. The set is quite extensive hence why I cannot simply use the method given on Python regular expressions matching within set. I was also wondering if regex is the best solution. I have read the http://docs.python.org/3.3/library/re.html alas, it does not seem to hold the answer. Here is what I have tried so far...

import re
import os
import itertools

f2 = open(file_path)

unilist = []

bases=['A','G','C','N','U']

patterns= set(''.join(per) for per in itertools.product(bases, repeat=5))

#stuff

if re.match(r'.*?(?:patterns)', line):
    print(line)
    unilist.append(next(f2).strip())
    print (unilist)

You see, the problem is that I do not know how to refer to my set...

The file I am trying to match it to looks like:

@SRR566546.970 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50 TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT

+

hhhhhhhhhhghhghhhhhfhhhhhfffffeee[X]b[d[ed`[Y[^Y

Martijn Pieters · Accepted Answer

You are going about it the wrong way.

You simply leave the set of characters to the regular expression:

re.search('[AGCNU]{5}', line)

matches any 5-character pattern built from those 5 characters; that matches the same 3125 different combinations you generated with your set line, but doesn't need to build all possible combinations up front.

Otherwise, your regular expression attempt had no correlation to your patterns variable, the pattern r'.*?(?:patterns)' would match 0 or more arbitrary characters, followed by the literal text 'patterns'.

Matching regex to set

Answers (2)

Related Questions