Steve B
Steve B

Reputation: 37710

Remove optional last parenthesis

I'm trying to parse file name and to remove potential number in parenthesis (when having multiple file with same base name), but only the last one

Here are some expected results:

  1. Test ==> Test
  2. Test (1) ==> Test
  3. Test (1) (2) ==> Test (1)
  4. Test (123) (232) ==> Test (123)
  5. Test (1) foo ==> Test (1) foo

I tried to use this regex : (.*)( ?\(\d+\))+, but the test 1 fails.

I also tried : (.*)( ?\(\d+\))? but only the 1st test succeed.

I suspect there's something wrong with quantifiers in the regex, but I didn't find exactly what.

How to fix my regex ?

Upvotes: 4

Views: 1006

Answers (5)

The fourth bird
The fourth bird

Reputation: 163632

You could use your first pattern (.*)( ?\(\d+\))+ and replace with the first capturing group only.

To optimize it a bit, you could remove the quantifier + after the last group and omit the second capturing group.

Then this will remove the last parenthesis with a number between by matching until the end of the string and then backtrack until the last occurrence of parenthesis with a digit.

In the replacement use the first capturing group:

^(.*) \(\d+\)

Explanation

  • ^ Start of string
  • (.*) Capture group 1, match any char 0+ times
  • (\d+) Match space, ( 1+ digits )

.NET Regex demo | C# demo

enter image description here

Upvotes: 0

Dean Taylor
Dean Taylor

Reputation: 42051

As an alternative you could use an end of string / line anchor:

Regular Expression

\s*\(\d+\)$

Visualisation

enter image description here

Example usage

string resultString = null;
try {
    resultString = Regex.Replace(subjectString, @"\s*\(\d+\)$", "", RegexOptions.Multiline);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}


Human Readable

  • Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) \s*
    • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
  • Match the opening parenthesis character \(
  • Match a single character that is a “digit” (any decimal number in any Unicode script) \d+
    • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Match the closing parenthesis character \)
  • Assert position at the end of a line (at the end of the string or before a line break character) (line feed) $

Upvotes: 2

Greg
Greg

Reputation: 11478

You can avoid Regular Expressions all together, if you simply want the second to you could do:

string example = @"Test (1) (2)    (3) (4)";

public string GetPathName(string input)
{
     var position = input.LastIndexOf('(');
     if(position == -1)
          return input;

     return example.Substring(0, position);
}

You know that the left parenthesis will always be at the start of the ending name, so why not find the index to that, then grab the rest from position zero? I know you requested Regular Expression, but if you do not need it why over engineer for it?

Upvotes: 0

Jan
Jan

Reputation: 43199

Just use a neg. lookahead:

\s*\([^()]+\)(?!.*\([^()]+\))

See a demo on regex101.com.


More verbose this is

\s*              # whitespaces, eventually
\([^()]+\)       # (...)
(?!.*\([^()]+\)) # neg. lookahead, no (...) must follow

Upvotes: 3

Emma
Emma

Reputation: 27743

My guess is that you might likely want to design an expression similar to:

^(.*?)\s*(\(\s*\d+\)\s*)?$

Test

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"^(.*?)\s*(\(\s*\d+\)\s*)?$";
        string input = @"Test
Test (1)
Test (1) (2)
Test (1) (2) (3)
Test (1) (2)    (3) (4) 
";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Upvotes: 6

Related Questions