Reputation: 3895
How to write a regex that will combine numbers and chars in a string in any order? For example, If I want to read some kind of invoice number, I have example like this:
example 1: 32ah3af
example 2: 32ahPP-A2ah3af
example 3: 3A63af-3HHx3B-APe5y5-9OPiis
example 4: 3A63af 3HHx3B APe5y5 9OPiis
So each 'block' have length between 3 and 7 chars (letters or numbers) that can be in any order (letters can be lowercase or uppercase). Each. 'block' can start with letter or with number.
It can have one "block" or max 4 blocks that are separated with ' '
or -
.
I know that I can make separators like: \s
or \-
, but I have no idea how to make these kind of blocks that have (or do not have) separator.
I tried with something like this:
([0-9]?[A-z]?){3,7}
But it does not work
Upvotes: 0
Views: 1102
Reputation: 163467
You could use
^[A-Za-z0-9]{3,7}(?:[ -][A-Za-z0-9]{3,7}){0,3}\b
The pattern matches:
^
Start of string[A-Za-z0-9]{3,7}
Match 3-7 times either a lower or uppercase char a-z or number 0-9(?:
Non capture group
[ -][A-Za-z0-9]{3,7}
Match either a space or -
and 3-7 times either a lower or uppercase char a-z or number 0-9){0,3}
Close the non capture group and repeat 0-3 times to have a maximum or 4 occurrences\b
A word boundary to prevent a partial matchNote that [A-z]
matches more than [A-Za-z0-9]
Upvotes: 2
Reputation: 16700
As long as you want to only capture / search for the invoice ids, the suggestion from Hao Wu is valid:
r'\w{3,7}'
for regex (check here).
If you can drop the remaining part, then this should be enough.
You can more precisely capture the whole string with example 1
:
r'example (\d+): ((\w{3,7}[\- ]?)+)'
See here how it works. Please note how capturing groups are represented.
Upvotes: 0