o17t H1H' S'k
o17t H1H' S'k

Reputation: 2725

Regular expression with possible hyphen and then a limited number of words characters

I need a regex to match expressions which contain the string OKAY then a possible hyphen, and then zero or one word characters. after this any non-word-character is accepted and then anything. for expressions which match, OKAY will be changed to OK if there is no word-character following, and to e.g: OA if the letter following is A. if the hyphen exists it is dropped.

OKAY         =>       OK
OKAY-        =>       OK
OKAYA        =>       OA
OKAY-A       =>       OA
OKAYAB       =>       OKAYAB          (no-match)
OKAY-AB      =>       OKAY-AB         (no-match)

examples may be followed by e.g: .CD without changing the results

OKAY.CD         =>       OK.CD
OKAY-.CD        =>       OK.CD
OKAYA.CD        =>       OA.CD
OKAY-A.CD       =>       OA.CD
OKAYAB.CD       =>       OKAYAB.CD          (no-match)
OKAY-AB.CD      =>       OKAY-AB.CD         (no-match)

my problem implementing this was that since both the hyphen and the word-character are optional, I get "lazy" matches which match also the non-wanted cases. for the sake of education I would appreciate examples both with and without look-aheads (if possible).

Upvotes: 0

Views: 225

Answers (3)

Andrew Clark
Andrew Clark

Reputation: 208405

Here is a regex that should work for you:

\bOKAY(?>-?)(\w)?([^\w\s]\S*)?(?!\S)

Since it isn't clear what language you are using, here is pseudo code for how you would do the replacement.

"O" + (match.group(1) if match.group(1) else "K") + match.group(2)

Here is a rubular: http://www.rubular.com/r/SE8MBkUUUo


edit: I made some changes in the above regex after the comments, but the description below does not reflect those changes. Here are the changes from the original regex:

  • Changed ^ to \b so it doesn't need to start at beginning of line
  • \W became [^\w\s], this prevents OKAY OKAY from being one match
  • Changed .* to \S* so the match will end at whitespace
  • Changed $ to (?!\S), (?!\S) means "only match if we are at the end of the string or the next character is whitespace", could also be written as (?=\s|\z)

The really tricky part here is that a regex like ^OKAY-?(\w)?(\W.*)?$ looks like it would work, but it does not for a case like OKAY-AB because in the end both the -? and the (\w)? will not match, and then (\W.*)? will match the remainder of the string.

What we need to do to fix this is make it so -? will not backtrack. This would be simple if possessive quantifiers were supported by .NET, then we could just change it to -?+.

Unfortunately they aren't supported, so we need to use atomic grouping instead. (?>-?) will optionally match a -, but will forget all backtracking information as soon as it exits the group. Note that the atomic group does not capture, so (\w)? is capture group 1.

Upvotes: 2

murgatroid99
murgatroid99

Reputation: 20242

To do this without lookaheads, you can use

^(OKAY)(((-\w?|\w)(\W.*)?)|[^-\w].*)?$

This matches the word "OKAY" and then an optional group containing either a -, an optional word character, and then an optional non-word-character followed by anything group, or a character that is not a - or a word character followed by anything. The ^ and $ match the start and end of the string respectively, so it will only match exactly the acceptable strings.

Lookaheads would barely make a difference. The only change would be to put a lookahead ((?=...)) around everything after the "OKAY" group.

To use this with .net, the only change needed would be to escape all of the \ in the string.

Upvotes: 1

kevlar1818
kevlar1818

Reputation: 3125

Don't know .NET regex, but this is a start with preg-style matching:

OKAY-?(\w?)([^\w-]\w+)?\s*$

If $1 is empty, then output is OK$2

Otherwise, output is O$1$2.

Upvotes: 1

Related Questions