Fabien
Fabien

Reputation: 51

Matching regex to set

I am looking for a way to match the beginning of a line to a regex and for the line to be returned afterwards. The set is quite extensive hence why I cannot simply use the method given on Python regular expressions matching within set. I was also wondering if regex is the best solution. I have read the http://docs.python.org/3.3/library/re.html alas, it does not seem to hold the answer. Here is what I have tried so far...

import re
import os
import itertools

f2 = open(file_path)

unilist = []

bases=['A','G','C','N','U']

patterns= set(''.join(per) for per in itertools.product(bases, repeat=5))

#stuff

if re.match(r'.*?(?:patterns)', line):
    print(line)
    unilist.append(next(f2).strip())
    print (unilist)

You see, the problem is that I do not know how to refer to my set...

The file I am trying to match it to looks like:

@SRR566546.970 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50 TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT

+

hhhhhhhhhhghhghhhhhfhhhhhfffffeee[X]b[d[ed`[Y[^Y

Upvotes: 0

Views: 2476

Answers (2)

eyquem
eyquem

Reputation: 27585

According to what I've understood from your question, it seems to me that this could fit your need:

import re

sss = '''dfgsdfAUGNA321354354
!=**$=)"nNNUUG54788
=AkjhhUUNGffdffAAGjhff1245GGAUjkjdUU
.....cv GAUNAANNUGGA'''

print re.findall('^(.+?[AGCNU]{5})',sss,re.MULTILINE)

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1123410

You are going about it the wrong way.

You simply leave the set of characters to the regular expression:

re.search('[AGCNU]{5}', line)

matches any 5-character pattern built from those 5 characters; that matches the same 3125 different combinations you generated with your set line, but doesn't need to build all possible combinations up front.

Otherwise, your regular expression attempt had no correlation to your patterns variable, the pattern r'.*?(?:patterns)' would match 0 or more arbitrary characters, followed by the literal text 'patterns'.

Upvotes: 2

Related Questions