Reputation: 1365
I have an email body. It contains several lines of text. I need to extract the first occurrence of a string that:
The shape of the dashed string is unknown. It may contain letters and numbers of any number, i.e.: AA3A-123-NNN-D or 12-OOO-12455-AS
For example:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec imperdiet porta libero ac imperdiet.
Nam enim nisl: aliquam ut feugiat vitae
Specific text after which I need to search: Etiam rhoncus AAFA-12X-DDDD-12 metus risus More text: foo
Target language is C#.
I've tried doing something like ([A-Za-z0-9]{5}-[A-Za-z0-9]{4}-[A-Za-z0-9]{3}-[A-Za-z0-9]{5})
but as you can see here I need to set the shape of the string which is not always known.
Upvotes: 4
Views: 3286
Reputation: 4874
I'd go with something like (?:[a-zA-Z0-9]+-){3,}[a-zA-Z0-9]+
. What this will do is match 3 or more groups of alphanumerics ending with a dash, followed by one that doesn't.
Upvotes: 1
Reputation: 43189
You can use a lazy quantifier with [\s\S]
:
(?:Specific\ text\ after\ which\ I\ need\ to\ search:)
[\s\S]+?\K
(\b\w+-\w+-\w+-\w+\b)
The \b
is a word boundary, \K
deletes everything to the left from the match.
See a demo on regex101.com.
Upvotes: 4
Reputation: 323
If your expression contains an unknown number of letters and numbers the best you can do is specify a range for your regular expression. I see in your examples the most a block has is 5 characters and least one has is 1 character.
So something like this will capture it,
([A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5})
Upvotes: 1