srock
srock

Reputation: 413

using reg exp to check if test string is of a fixed format

I want to make sure using regex that a string is of the format- "999.999-A9-Won" and without any white spaces or tabs or newline characters.

  1. There may be 2 or 3 numbers in the range 0 - 9.
  2. Followed by a period '.'
  3. Again followed by 2 or 3 numbers in the range 0 - 9
  4. Followed by a hyphen, character 'A' and a number between 0 - 9 .
  5. This can be followed by anything.

Example: 87.98-A8-abcdef

The code I have come up until now is:

testString = "87.98-A1-help"
regCompiled = re.compile('^[0-9][0-9][.][0-9][0-9][-A][0-9][-]*');
checkMatch = re.match(regCompiled, testString);
if checkMatch:
    print ("FOUND")
else:
    print("Not Found")

This doesn't seem to work. I'm not sure what I'm missing and also the problem here is I'm not checking for white spaces, tabs and new line characters and also hard-coded the number for integers before and after decimal.

Upvotes: 2

Views: 3755

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121524

With {m,n} you can specify the number of times a pattern can repeat, and the \d character class matches all digits. The \S character class matches anything that is not whitespace. Using these your regular expression can be simplified to:

re.compile(r'\d{2,3}\.\d{2,3}-A\d-\S*\Z')

Note also the \Z anchor, making the \S* expression match all the way to the end of the string. No whitespace (newlines, tabs, etc.) are allowed here. If you combine this with the .match() method you assure that all characters in your tested string conform to the pattern, nothing more, nothing less. See search() vs. match() for more information on .match().

A small demonstration:

>>> import re
>>> pattern = re.compile(r'\d{2,3}\.\d{2,3}-A\d-\S*\Z')
>>> pattern.match('87.98-A1-help')
<_sre.SRE_Match object at 0x1026905e0>
>>> pattern.match('123.45-A6-no whitespace allowed')
>>> pattern.match('123.45-A6-everything_else_is_allowed')
<_sre.SRE_Match object at 0x1026905e0>

Upvotes: 5

Simeon Visser
Simeon Visser

Reputation: 122336

Let's look at your regular expression. If you want:

"2 or 3 numbers in the range 0 - 9"

then you can't start your regular expression with '^[0-9][0-9][.] because that will only match strings with exactly two integers at the beginning. A second issue with your regex is at the end: [0-9][-]* - if you wish to match anything at the end of the string then you need to finish your regular expression with .* instead. Edit: see Martijn Pieters's answer regarding the whitespace in the regular expressions.

Here is an updated regular expression:

testString = "87.98-A1-help"
regCompiled = re.compile('^[0-9]{2,3}\.[0-9]{2,3}-A[0-9]-.*');
checkMatch = re.match(regCompiled, testString);
if checkMatch:
    print ("FOUND")
else:
    print("Not Found")

Not everything needs to be enclosed inside [ and ], in particular when you know the character(s) that you wish to match (such as the part -A). Furthermore:

  • the notation {m,n} means: match at least m times and at most n times, and
  • to explicitly match a dot, you need to escape it: that's why there is \. in the regular expression above.

Upvotes: 3

Related Questions