cavs
cavs

Reputation: 377

regex match exact pattern within string

if I have the following string 'some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888' and I want to find 15 digit numbers (so only 151283917503423) how do I make it so that it doesn't match the bigger number and also deal with the possibility that the string can just be '151283917503423' therefore I cannot identify it by it possibly containing spaces on both sides?

serial = re.compile('[0-9]{15}')
serial.findall('some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888')

this returns both 66666666666666666667867866 and 151283917503423 but I only want the latter

Upvotes: 3

Views: 777

Answers (4)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14955

Use word boundaries:

serial = re.compile(r'\b[0-9]{15}\b')

\b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

Upvotes: 5

user557597
user557597

Reputation:

Since word boundaries \b contain 2 assertions each, I would use a single assertion
instead.

(?<![0-9])[0-9]{15}(?![0-9])

should be quicker?

Upvotes: 1

timgeb
timgeb

Reputation: 78680

Include word boundaries. Let s be your string. You can use

 >>> re.findall(r'\b\d{15}\b' ,s)
 ['151283917503423']

where \b asserts a word boundary (^\w|\w$|\W\w|\w\W)

Upvotes: 2

anubhava
anubhava

Reputation: 785068

You need to use word boundaries to ensure you don't match unwanted text on either side of your match:

>>> serial = re.compile(r'\b\d{15}\b')
>>> serial.findall('some numbers 66666666666666666667867866 and serial 151283917503423 and 8888888')
['151283917503423']

Upvotes: 4

Related Questions