Davide De Santis
Davide De Santis

Reputation: 1034

Get un-prefixed/un-escaped text with RegEx

I have the following input text:

A B C D E \F G H I JKL \M

and I would like to mach all characters without a \ as prexix, each of the characters individually. So basically, as a match, I'd like to get A, B, C, D, E, G, H, I, J, K and L, with F and M not passing because they are prefixed/escaped.

I got as far as

([^\\]([A-Z]{1}))

which works but not exactly as expected: - A is ignored, because there is nothing before (and I am testing for anything but the backslash) - each letter is matched with the space before - JKL is matches as J with a space before, and KL as one string.

I have tried different other variations with parantheses but was not successful with that.

Upvotes: 2

Views: 35

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

The negated character class [^\\] is a consuming pattern that matches the text, adds it to the match value and advances the regex index to the end of the match.

Use a non-consuming negative lookbehind:

(?<!\\)[A-Z]
^^^^^^^

See the regex demo. Being a non-consuming pattern, the (?<!\\) only checks if there is a backslash before an ASCII uppercase letter, and if there is any, the engine fails the match. If there is a \, the letter is matched (while the backslash remains missing in the match value).

C# code:

var results = Regex.Matches(s, @"(?<!\\)[A-Z]")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToList();

Upvotes: 2

Related Questions