ruvinda
ruvinda

Reputation: 11

Regular expressions - python

i am new to regular expressions and developed this to find out if idno has values from 0 to 9 in the first nine characters and V, v, X or x as the last. Is the syntax correct because it sends an error requesting two args. Another problem is that it should be only 10 characters long. I used a separate code to validate that but can I integrate it into this too?

if len(idno) is 10:
    if re.match("[0-9]{9}[VvXx],idno") == true:
        print "Valid"

Upvotes: 0

Views: 112

Answers (1)

jonrsharpe
jonrsharpe

Reputation: 122024

You have more wrong there than right, I'm afraid. Note the following:

  • You should really compare integers by equality (== 10) not identity (is 10) - CPython interns small integers, so your current code will work, but that's an implementation detail you shouldn't rely on;
  • If you add $ (end of string) to the end the regular expression will only match strings ten characters long, making the len check unnecessary anyway;
  • The quotes are in the wrong place, so you're passing a single string to re.match, rather than the pattern and the name you want to try to match it in - the comma and idno are all part of the pattern parameter;
  • 'true' != 'True': Python is case-sensitive, and the booleans start with capital letters;
  • re.match returns either an SRE_Match object or None, neither of which == True. However, it's pretty awkward to write == True even where you're only getting True or False, and you can use the fact that Match is truth-y and None is false-y to write the much neater if some_thing: rather than if some_thing == True:; and
  • Regular expressions already have a case covering [0-9], you can just use \d (digit).

Your code should therefore be:

if re.match(r'\d{9}[VvXx]$', idno):
          # ^ note 'raw' string, to avoid escaping the backslash
    print "Valid"

You could simplify further using the re.IGNORECASE flag and making the group for the last character [vx]. A few examples:

>>> import re
>>> for test in ('123456789x', '123456789a', '123abc456x', '123456789xa'):
    print test, re.match(r'\d{9}[vx]$', test, re.I)
                                            # ^ shorter version of IGNORECASE


123456789x <_sre.SRE_Match object at 0x10041e308>  # valid
123456789a None  # wrong final letter
123abc456x None  # non-digits in first nine characters
123456789xa None  # start matches but ends with additional character

Upvotes: 5

Related Questions