Ashton
Ashton

Reputation: 1365

Regex to match a string that contains at least 3 dashes / hyphens

I have an email body. It contains several lines of text. I need to extract the first occurrence of a string that:

  1. comes after a specific text
  2. contains at least 3 dashes

The shape of the dashed string is unknown. It may contain letters and numbers of any number, i.e.: AA3A-123-NNN-D or 12-OOO-12455-AS

For example:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec imperdiet porta libero ac imperdiet.

Nam enim nisl: aliquam ut feugiat vitae

Specific text after which I need to search: Etiam rhoncus AAFA-12X-DDDD-12 metus risus More text: foo

Target language is C#.

I've tried doing something like ([A-Za-z0-9]{5}-[A-Za-z0-9]{4}-[A-Za-z0-9]{3}-[A-Za-z0-9]{5}) but as you can see here I need to set the shape of the string which is not always known.

Upvotes: 4

Views: 3286

Answers (3)

Sebastian Lenartowicz
Sebastian Lenartowicz

Reputation: 4874

I'd go with something like (?:[a-zA-Z0-9]+-){3,}[a-zA-Z0-9]+. What this will do is match 3 or more groups of alphanumerics ending with a dash, followed by one that doesn't.

Try it yourself on Regex101.

Upvotes: 1

Jan
Jan

Reputation: 43189

You can use a lazy quantifier with [\s\S]:

(?:Specific\ text\ after\ which\ I\ need\ to\ search:)
[\s\S]+?\K
(\b\w+-\w+-\w+-\w+\b)

The \b is a word boundary, \K deletes everything to the left from the match.
See a demo on regex101.com.

Upvotes: 4

Fivestar
Fivestar

Reputation: 323

If your expression contains an unknown number of letters and numbers the best you can do is specify a range for your regular expression. I see in your examples the most a block has is 5 characters and least one has is 1 character.

So something like this will capture it,

([A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5}-[A-Za-z0-9]{1,5})

Upvotes: 1

Related Questions