spugm1r3
spugm1r3

Reputation: 3649

Unusual Regex behavior in c#

I have a Regex that is behaving rather oddly and I can't figure why. Original Regex:

Regex regex = new Regex(@"(?i)\d\.\d\dv");

This expression returns/matches an equivalent to 1.35V or 1.35v, which is what I want. However, it is not exclusive enough for my program and it returns some strings I don't need.

Modified Regex:

Regex rgx = new Regex(@"(?i)\d\.\d\dv\s");

Simply by adding '\s' to the expression, it matches/returns DDR3, which is not at all what I want. I'm guessing some sort of inversion is occurring, but I don't understand why and I can't seem to find a reference to explain it. All I wanted to do was add a space to the end of expression to filter a few more results.

Any help would be greatly appreciated.

EDIT: Here is a functional test case with a generic version of what is going on in my code. Just open a new WPF in Visual Studio, copy and paste, and it should repeat the results for you.

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
    {
        InitializeComponent();
    }
    Regex rgx1 = new Regex(@"(?i)\d\.\d\dv");
    Regex rgx2 = new Regex(@"(?i)\d\.\d\dv\s");

    string testCase = @"DDR3 Vdd            |            |            |            |            |    1.35v   |";

    string str = null;

    public void IsMatch(string input)
    {
        Match rgx1Match = rgx1.Match(input);
        if (rgx1Match.Success)
        {
            GetInfo(input);
        }
    }
    public void GetInfo(string input)
    {
        Match rgx1Match = rgx1.Match(input);
        Match rgx2Match = rgx2.Match(input);

        string[] tempArray = input.Split();
        int index = 0;

        if (rgx1Match.Success)
        {
            index = GetMatchIndex(rgx1, tempArray);
            str = tempArray[index].Trim();
            global::System.Windows.Forms.MessageBox.Show("First expression match: " + str);
        }
        if (rgx2Match.Success)
        {
            index = GetMatchIndex(rgx2, tempArray);
            str = tempArray[index].Trim();
            System.Windows.Forms.MessageBox.Show(input);
            global::System.Windows.Forms.MessageBox.Show("Second expression match: " + str);
        }
    }
    public int GetMatchIndex(Regex expression, string[] input)
    {
        int index = 0;

        for (int i = 0; i < input.Length; i++)
        {
            if (index < 1)
            {
                Match rgxMatch = expression.Match(input[i]);
                if (rgxMatch.Success)
                {
                    index = i;
                }
            }
        }
        return index;
    }

    private void button1_Click(object sender, RoutedEventArgs e)
    {
        string line;
        IsMatch(testCase);
    }

}

}

The GetMatchesIndex method is called a number of times in other parts of the code without incident, it is just on this one Regex that I've hit a stumbling block.

Upvotes: 0

Views: 166

Answers (2)

mellamokb
mellamokb

Reputation: 56769

The behavior you are seeing has entirely to do with your application logic, and very little to do with the regular expression. In GetMatchIndex, you are defaulting index = 0. So what happens if none of the entries in string[] input match? You get back index = 0, which is the index of DDR3, the first element in string[] input.

You don't see that behavior in the first regular expression, because it matches 1.35v. However, when you add the space to the end, it doesn't match any of the entries in the split input, so you get back the first one by default which happens to be DDR3. Also, if (rgx1Match.Success) doesn't really help, because you check for a match in the entire string first (which does match because there's a space there), and then search for the index after splitting, which removed the spaces!

The fix is pretty simple: When you are returning an index from an array in a programming language that uses 0-based numbering, the standard way to represent "not found" is with -1 so it doesn't get confused with the valid result of 0. So default index to -1 instead and handle a result of -1 as a special case, i.e., display an error message to the user like "No matches".

Upvotes: 2

David Pfeffer
David Pfeffer

Reputation: 39833

Your question is incorrect:

new Regex(@"(?i)\d\.\d\dv\s").Match("DDR3").Success is false

In fact, the results seem to work exactly as you'd like.

Upvotes: 1

Related Questions