How to identify a series of numbers inside a paragraph

Question

I have a paragraph/sentence from which I want to identify

any series of number 6 digits or more
any series of numbers with a "-" (dash)

but I don't want to identify

any numbers preceded by a $(dollar)
any series of numbers with , (comma)

How can I achieve this?

The regex I tried is: r'(?:\s|^)(\d-?(\s)?){6,}(?=[?\s]|$)' but its not accurate.

I'm looking for these patterns inside a paragraph

123-456-789
123-456
123 456
123 456 789 It may also contain full stop(.) at the end too but it should ignore the following patterns
$123654
$ 123654
12,4569
123*123*7732
123h434k5454

The fourth bird · Accepted Answer

You could match what you don't want and capture in a group what you want to keep.

Using re.findall the group 1 values will be returned.

Afterwards you might filter out the empty strings.

(?



In parts


(? Assert a whitespace boundary on the left

(?: Non capture group


\$\s* Match a dollar sign, 0+ whitespace chars
\d+(?:\,\d+)? Match 1+ digits with an optional comma digits part
| Or
( Capture group 1


\d+ Match 1+ digits
(?:[ -]\d+)+\.? Repeat a space or - 1+ times followed by an optional .
| Or
\d{3,} Match 3 or more digits (Or use {6,} for 6 or more

) Close group 1

) Close non capture group
(?!\S) Assert a whitespace boundary on the right


Regex demo | Python demo | Another Python demo

For example

import re

regex = r"(?


Output

['123456', '1234567890', '12345', '1-2-3', '123-456-789', '123-456-789.', '123-456', '123 456', '123 456 789', '123 456 789.', '123 456 123 456 789', '123', '456', '123', '456', '789']

How to identify a series of numbers inside a paragraph

Answers (1)

Related Questions