Hasen
Hasen

Reputation: 12346

Regex exclude specific character not working

I searched and found that [^?] will not include a certain character, such as a question mark in this case, but it seems to include a space instead which is not what I want. This pattern:

\((.*?)\)[^?]

matches anything in brackets unless there is a question mark right after the last bracket.

(need to capture including brackets) ignore this
(ignore this completely)?

This pattern captures the top line in brackets correctly without including the space, but also captures the line below which I want to ignore:

\((.*?)\)

What pattern can I use to capture the top line only without the trailing space but ignore the line below?

You can see that neither of these patterns work correctly:

https://regex101.com/r/fHXJ8x/1

https://regex101.com/r/fHXJ8x/2

Upvotes: 2

Views: 2523

Answers (4)

GoWiser
GoWiser

Reputation: 1055

Here is an example program written in C# - the comments describes what was changed during the feedback from the comments, and the regex's are in the order they appeared in this post.

// Porgram has been modifed in accordance with the dabate in the comments section
using System;
using System.Text.RegularExpressions;

namespace CS_Regex
{
    class Program
    {
        // Match parenthesized texts that aren't followed by a question mark.
        static void Main(string[] args)
        {
            string[] tests =
            {
                "(match this text) ignore this (ignore this)? and (match this) (and this)"
            };
            // The first three patterns matches, if the left parenthesis is not the last character.
            // The last pattern matches all parenthesized texts.
            string[] patterns = {
                @"\((.*?)\)[^?]", // Original regex
                @"\((.*)\)[^?]", // Regex that matches greedily, which was my first example, that caused the discussion in the comments.
                                 // I asked "Why do you have a question mark after matching zero or more characters?"
                @"(\([^)]*\))[^?]", // Regex that only matches if the left parenthesis is followed by another character, avoiding the use of the '?' operator.
                @"(\([^)]*\))(?!\?)", // Regex that matches all instances
            };
            foreach (string pattern in patterns) {
                Regex rx = new Regex(pattern, RegexOptions.Compiled);
                Console.WriteLine($"Regex: {pattern}");
                foreach (string data in tests)
                {
                    MatchCollection matches = rx.Matches(data);
                    Console.WriteLine($"{matches.Count} matches found in: {data}");
                    foreach (Match match in matches)
                        Console.WriteLine($"   matched value and group: '{match.Value}' and '{match.Groups[1]}'");
                }
            }
            Console.ReadKey();
        }
    }
}

The program produces the following output:

Regex: \((.*?)\)[^?]
2 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ' and 'match this text'
   matched value and group: '(ignore this)? and (match this) ' and 'ignore this)? and (match this'
Regex: \((.*)\)[^?]
1 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ignore this (ignore this)? and (match this) ' and 'match this text) ignore this (ignore this)? and (match this'
Regex: (\([^)]*\))[^?]
2 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ' and '(match this text)'
   matched value and group: '(match this) ' and '(match this)'
Regex: (\([^)]*\))(?!\?)
3 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text)' and '(match this text)'
   matched value and group: '(match this)' and '(match this)'
   matched value and group: '(and this)' and '(and this)'

The example has been edited, to reflect the discussion in the comments.

Upvotes: -1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

First of all, you cannot use a negated character class ([^?]) because it is a consuming pattern, i.e. the regex engine puts the matched text into the match memory buffer and advances the regex index to the match end position. That is why it matches that whitespace. You need to use a negative lookahead that is a non-consuming pattern, (?!\?), that won't add the text matched into the match.

Second, you should not rely on .*? when you restrict the context of the subsequent pattern because this pattern can match any amount of any text (other than line break chars by default). If you have ... (...)? and () ..., the \(.*?\)(?!\?) will match the leftmost ( until the leftmost ) that is not immediately followed with a ? char, i.e. the match will be (...)? and (), see this regex demo.

The solution is to avoid matching ( and ) in between parentheses:

\(([^()]*)\)(?!\?)

See the regex demo. Details:

  • \( - a ( char
  • ([^()]*) - Group 1: zero or more chars other than ( and )
  • \) - a ) char
  • (?!\?) - a negative lookahead that fails the match if there is a ? char immediately to the right of the current location ("fails" here mean that the regex engine will backtrack to see if it can match a string in another way).

Upvotes: 1

Ghost Ops
Ghost Ops

Reputation: 1734

Try this regex...

It works, ignoring any text inside bracket, which is also next to a question mark

Also ignores unwanted spaces

\((.*?)\)(?!\?)

Output:

enter image description here

Upvotes: 7

Иван Зыков
Иван Зыков

Reputation: 493

Well, you can use something like this:

(\(.+\)\?)|(\(.*\))

You can't just ignore second string because by your requirements they are the same. Each of it contain brackets.

But you can define two groups in regex and use only second

(need to capture including brackets) ignore this $2

(ignore this completely)? $1

Upvotes: 0

Related Questions