Reputation: 11
I have an invoice in readable form. I need to extract PO number from the invoice. The PO numbers come in a particular format (26123456, 26234567)
. It starts with 26
and has 6
numbers following it. I am trying to extract it using regular expressions.
I have passed this as my parameters.
[26]\d{6,6}
also I have tried this ^[26]\d{6,6}
However, the problems I am facing are:
If the PO number is 26454545
and before the PO number there are other numbers in the invoice such as Telephone numbers which have in them a substring with 26
, its extracting that as well. For ex. 12345678987
this number is being extracted as well since there is 2 and 6 present in the substring.
Upvotes: 0
Views: 130
Reputation: 174796
Remove the character class and add word boundaries.
\b26\d{6}\b
[26]
will match a single character from the given list whether it may be 2 or 6. To match a number 26, just use the number as it is.
By adding \b
at the start and at the end helps to match a complete number. Since \b
matches between a word character and a non-word character. You could also use assertions here like (?<!\d)26\d{6}(?!\d)
.
There is another pattern that i want to extract 12300012345. after the first three numbers there are always 3 zeros followed by 5 numbers.
\b\d{3}000\d{5}\b
If you want to combine the both, then you need to use the regex alternation operator |
\b26\d{6}\b|\b\d{3}000\d{5}\b
Upvotes: 1