Mondrianaire
Mondrianaire

Reputation: 103

Python Regex Problems

I am having trouble converting a RegEx expression to python. I know that '(\\d+)' is the expression for a single integer, but I cannot figure out how to get an integer that is [2-9].

The RegEx expression is as follows:

[2-9][p-z][a-h][2-9][a-z]*[p-z][2-9][p-z][2-9][p-z]

This is my current expression but it produces many false positives as it is not specific enough:

          re1='(\\d+)'    # Integer Number 1
            re2='([a-z])'   # Any Single Word Character (Not Whitespace) 1
            re3='([a-z])'   # Any Single Word Character (Not Whitespace) 2
            re4='(\\d+)'    # Integer Number 2
            re5='((?:[a-z][a-z]+))' # Word 1
            re6='(\\d+)'    # Integer Number 3
            re7='([a-z])'   # Any Single Word Character (Not Whitespace) 3
            re8='(.)'   # Any Single Character 1
            re9='([a-z])'   # Any Single Word Character (Not Whitespace) 4
            ## Regex search for passcodes ## Thanks to Pierluigi Failla
            rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9,re.IGNORECASE|re.DOTALL)
            m = rg.search(txt)
            if m:
                int1=m.group(1)
                w1=m.group(2)
                w2=m.group(3)
                int2=m.group(4)
                word1=m.group(5)
                int3=m.group(6)
                w3=m.group(7)
                c1=m.group(8)
                w4=m.group(9)
                txt2='"'+int1+w1+w2+int2+word1+int3+w3+c1+w4+'"'
                return [txt2]

Upvotes: 0

Views: 901

Answers (2)

eyquem
eyquem

Reputation: 27575

I propose this code, based on what I see in your question:

import re

pat = ('([2-9])'        # Integer Number 1
       '([p-z])'        # Any Single Word Character (Not Whitespace) 1
       '([a-h])'        # Any Single Word Character (Not Whitespace) 2
       '([2-9])'        # Integer Number 2
       '([a-z]*[p-z]+)' # Word 1
       '([2-9])'        # Integer Number 3
       '([p-z])'        # Any Single Word Character (Not Whitespace) 3
       '(.)'            # Any Single Character 1
       '([p-z])'        # Any Single Word Character (Not Whitespace) 4
       )
rg = re.compile(pat)

txt = 'jiji4pa6fmlgkfmoaz8p#q,,,,,,,,,,'
m = rg.search(txt)
if m:
    text2 = "%s%s%s%s%s%s%s%s%s" % m.groups()
    print text2

# prints 4pa6fmlgkfmoaz8p#q

EDIT

text2 = ''.join(m.groups())  # is better

Upvotes: 1

Ian Knight
Ian Knight

Reputation: 2476

You should be able to use the range 2-9 in Python, like so: re1 = re.compile(r'[2-9]'). A test in my console then showed that re1.match('7') returns a MatchObject as you want, whereas re1.match('0') returns None, also as you want.

You also appear to have used the range [a-z] in re2, where you said you wanted [p-z] - similar issues in the other character ranges.

Upvotes: 2

Related Questions