Reputation: 47
Does anyone how to know to substring in regular expression? I am currently profiling data and i saw different format such as :
EB0000000
EB00000000PHL00000000F00000000
P0000000A
When I used my expression:
\b(?:[A-Z]{1}\d{7}[A-Z]{1}|[A-Z]{1}\d{7,8}|[A-Z]{2}\d{6}|[A-Z]{2}\d{7,8})\b
I captured the first and last sample, but the second looks improper data but i still want to capture EB and those 8 digits before PHL. Is it possible in regexp? TIA
Upvotes: 1
Views: 85
Reputation: 163362
It is possible, but you could change the order of the alternatives to put the most specific one at the beginning and then remove the word boundary at the end.
Note that you can omit {1}
\b(?:[A-Z]{2}\d{7,8}|[A-Z]\d{7}[A-Z]|[A-Z]\d{7,8}|[A-Z]{2}\d{6})
In parts
\b
Word boundary(?:
Non capture group
[A-Z]{2}\d{7,8}
Match 2 times A-Z and 7-8 digits|
Or[A-Z]\d{7}[A-Z]
Match A-Z, 7 digits and A-Z|
Or[A-Z]\d{7,8}
Match A-Z and 7-8 digits|
Or[A-Z]{2}\d{6}
Match 2 times A-Z and 6 digits)
Close groupUpvotes: 1
Reputation: 36
Why is it so hard to write? Maybe there are some lines nearby that should not fall into the selection?
\b[A-Z\d]{8,}\b
Upvotes: 2