Hogan
Hogan

Reputation: 319

Regex: Trying to match a prefix anywhere in a string multiple times

I am working on a regex for my C# app and having trouble getting the matches I'm looking for...

The jist of the problem is that I'm trying to pick out strings that need to be translated and replace them with their internationalized counterpart. The regex is for picking out the translatable resources. We've decided to prefix all translatable resources with "OH_" Putting them back to back seems the be the issue with the regex. Do I need to state that they must be separated by a space at a minimum?

OH_OrderItemStatusChanged
Style1PS1A1OH_OrderItemStatusSpacerOH_OrderItemStatusID_2
(OH_OrderItemSentTo )  (OH_SalesRep )

My Regex is OH_\w+

It finds the following matches:

OH_OrderItemStatusChanged
OH_OrderItemStatusSpacerOH_OrderItemStatusID_2
OH_OrderItemSentTo
OH_SalesRep

The second match should actually be two matches:

OH_OrderItemStatusSpacer
OH_OrderItemStatusID_2

I've looked at several examples and can't find what i'm looking for. Is this something that can be done in a regex, or do I have to break it out?

Upvotes: 3

Views: 2483

Answers (3)

gpmurthy
gpmurthy

Reputation: 2427

Consider the following Regex...

OH_.*?(?=(OH_|\r|\)))

Upvotes: 1

p.s.w.g
p.s.w.g

Reputation: 149040

Tim Pietzcker's solution is excellent, but here's an alternative:

(OH_\w+?)+\b

This will match OH_ followed by one or more word characters, non-greedily. And it will allow that group to be captured one or more times before the end of the string. This means you'll have to inspect the Captures collection to get all the results. For example:

var input = "OH_OrderItemStatusSpacerOH_OrderItemStatusID_2";
var matches = Regex.Matches(input, @"(OH_\w+?)+\b");
foreach(Capture c in matches[0].Groups[1].Captures)
    Console.WriteLine(c.Value);

This will produce:

OH_OrderItemStatusSpacer

OH_OrderItemStatusID_2

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336408

OH_\w+

is a good start, but of course \w+ also matches OH_, so you need to exclude that from the match. This requires the use of a negative lookahead assertion:

OH_(?:(?!OH_)\w)+

Explanation:

OH_       # Match OH_.
(?:       # Start of non-capturing group:
 (?!OH_)  # Assert that we're not at the start of the string OH_,
 \w       # then match an alnum character.
)+        # Repeat as often as possible.

See it on regex101.

Upvotes: 4

Related Questions