Reputation: 25
I have multiple documents that I need to match some codes that we have inside: The codes have this structure: CRL-LLL-LLL-LLL-LLL-DDDDDD | Example: CRL-SYW-CON-LKA-TMP-800001
There're mainly correct cases, however, there some cases with this situation: CRL-SYW-CON-LKA-TMP-XXXXXX (Because they don't know the number)
Please look at this: https://gyazo.com/c950b3f687929d19fc7b2cf63cc9721c
Just testing in Sublime text, I came up with something using this: .*-\d{6} but it's take more parts that I don't need:
https://gyazo.com/e410a9267199f567b9d146ce9c3f1839
And idea could be something like this:
Thank you!!
Upvotes: 2
Views: 78
Reputation: 626689
You can use
re.findall(r'\bCRL(?:-[A-Z]{3}){4}-(?:\d{6}|X{6})\b', text)
See the regex demo.
Details:
\b
- a word boundaryCRL
- a CRL
string(?:-[A-Z]{3}){4}
- four repetitions of -
and 3 uppercase letters-
- a hyphen(?:\d{6}|X{6})
- six digits or six X
chars\b
- a word boundaryUpvotes: 1