user1255276
user1255276

Reputation: 563

C# - Negative Lookahead doesn't seem to work

Working in C# .Net 4.5

I need an expression that will look through a string and fail the match if the string has two or more capital characters anywhere in the string.

What I think should be the correct pattern is this:

(?![A-Z]{2,})\w

Note: tried both ?! and ?<!

I got the opposite to work, search a string and return true if there are 2 or more cap's in a row and that pattern is as follows:

(?=[A-Z]{2,})\w

But I have to have this working off of the negative lookahead pattern.

From all the posts I've read this should be the correct way to do it, but it's not working for me.

I've read through questions such as :

C# regexp negative lookahead or Regex negative lookahead in c#

etc...

I don't want to list them all. But they all say more or less the same thing, just use the negative lookahead (?!)

Can anyone see what I'm doing wrong for this not to work?

Edit:

added some examples:

  1. Hello - Should pass
  2. HEllo - Should fail
  3. heLLo - Should fail
  4. HELLO - should fail

Advanced version:

  1. Hello World - should pass
  2. Hello WOrld - should fail
  3. hello wORld - should fail
  4. hello WORLD - should fail

Upvotes: 5

Views: 6248

Answers (2)

user557597
user557597

Reputation:

You only need to FAIL a match if you are trying to match something.

What you are trying to match is the failure.

if [A-Z].*?[A-Z] matches the string contains 2 cap letters.

If not two in a row, its this (multi-line) -> ^[^A-Z\r\n]*(?:[A-Z](?![A-Z])[^A-Z\r\n]*)*$

To match a non-empty string, just add a simple assertion.

^(?!$)[^A-Z\r\n]*(?:[A-Z](?![A-Z])[^A-Z\r\n]*)*$

For Unicode properties, use the \p{Lu} form

^[^\p{Lu}\r\n]*(?:\p{Lu}(?!\p{Lu})[^\p{Lu}\r\n]*)*$


Input:

1.Hello - Should pass
2.HEllo - Should fail
3.heLLo - Should fail
4.HELLO - should fail

Advanced version:
1.Hello World - should pass
2.Hello WOrld - should fail
3.hello wORld - should fail
4.hello WORLD - should fail

Benchmark

Regex1:   ^(?!.*\b\w*\p{Lu}\w*\p{Lu}).*$
Options:  < ICU - m >
Completed iterations:   80  /  80     ( x 1000 )
Matches found per iteration:   5
Elapsed Time:    8.28 s,   8279.28 ms,   8279281 µs


Regex2:   ^[^\p{Lu}\r\n]*(?:\p{Lu}(?!\p{Lu})[^\p{Lu}\r\n]*)*$
Options:  < ICU - m >
Completed iterations:   80  /  80     ( x 1000 )
Matches found per iteration:   5
Elapsed Time:    3.88 s,   3875.04 ms,   3875039 µs

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You can use the following regex:

^(?!.*\b\w*\p{Lu}\w*\p{Lu}).*$

See regex demo

It will match empty string, too, but you can use + quantifier instead of * to require at least 1 character.

To match a newline with this pattern, you will need to use RegexOptions.Singleline modifier.

The negative lookahead (?!.*\b\w*\p{Lu}\w*\p{Lu}) anchored at the start of the string will fail the match once a word is found that starts with zero or more word characters, followed by a captital letter, again followed by zero or more word characters and then again an uppercase letter. You can shorten this with a limiting quantifier: ^(?!.*\b(?:\w*\p{Lu}){2}).*$.

Upvotes: 3

Related Questions