Robinson
Robinson

Reputation: 10132

Matching strings with wildcard

I would like to match strings with a wildcard (*), where the wildcard means "any". For example:

*X = string must end with X
X* = string must start with X
*X* = string must contain X

Also, some compound uses such as:

*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.

Is there a simple algorithm to do this? I'm unsure about using regex (though it is a possibility).

To clarify, the users will type in the above to a filter box (as simple a filter as possible), I don't want them to have to write regular expressions themselves. So something I can easily transform from the above notation would be good.

Upvotes: 108

Views: 173428

Answers (11)

Pavel Khrapkin
Pavel Khrapkin

Reputation: 75

It is necessary to take into consideration, that Regex IsMatch gives true with XYZ, when checking match with Y*. To avoid it, I use "^" anchor

isMatch(str1, "^" + str2.Replace("*", ".*?"));  

So, full code to solve your problem is

bool isMatchStr(string str1, string str2)
{
    string s1 = str1.Replace("*", ".*?");
    string s2 = str2.Replace("*", ".*?");
    bool r1 = Regex.IsMatch(s1, "^" + s2);
    bool r2 = Regex.IsMatch(s2, "^" + s1);
    return r1 || r2;
}

Upvotes: 4

Wouter
Wouter

Reputation: 2958

This is kind of an improvement on the popular answer from @Dmitry Bychenko above (https://stackoverflow.com/a/30300521/4491768). In order to support ? and * as a matching characters we have to escape them. Use \\? or \\* to escape them.

Also a pre compiled regex will improve the performance (on reuse).

public class WildcardPattern
{
    private readonly string _expression;
    private readonly Regex _regex;

    public WildcardPattern(string pattern)
    {
        if (string.IsNullOrEmpty(pattern)) throw new ArgumentNullException(nameof(pattern));
       
        _expression = "^" + Regex.Escape(pattern)
            .Replace("\\\\\\?","??").Replace("\\?", ".").Replace("??","\\?")
            .Replace("\\\\\\*","**").Replace("\\*", ".*").Replace("**","\\*") + "$";
        _regex = new Regex(_expression, RegexOptions.Compiled);
    }

    public bool IsMatch(string value)
    {
        return _regex.IsMatch(value);
    }
}

usage

new WildcardPattern("Hello *\\**\\?").IsMatch("Hello W*rld?");
new WildcardPattern(@"Hello *\**\?").IsMatch("Hello W*rld?");

Upvotes: 3

Jamie Lester
Jamie Lester

Reputation: 988

For those using .NET Core 2.1+ or .NET 5+, you can use the FileSystemName.MatchesSimpleExpression method in the System.IO.Enumeration namespace.

string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);

Both parameters are actually ReadOnlySpan<char> but you can use string arguments too. There's also an overloaded method if you want to turn on/off case matching. It is case insensitive by default as Chris mentioned in the comments.

Upvotes: 42

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186833

Often, wild cards operate with two type of jokers:

  ? - any character  (one and only one)
  * - any characters (zero or more)

so you can easily convert these rules into appropriate regular expression:

// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
  return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$"; 
}

// If you want to implement "*" only
private static String WildCardToRegular(String value) {
  return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$"; 
}

And then you can use Regex as usual:

  String test = "Some Data X";

  Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
  Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
  Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));

  // Starts with S, ends with X, contains "me" and "a" (in that order) 
  Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));

Upvotes: 196

ZIELIK
ZIELIK

Reputation: 21

To support those one with C#+Excel (for partial known WS name) but not only - here's my code with wildcard (ddd*). Briefly: the code gets all WS names and if today's weekday(ddd) matches the first 3 letters of WS name (bool=true) then it turn it to string that gets extracted out of the loop.

using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;

...
string weekDay = DateTime.Now.ToString("ddd*");

Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);

            static String WildCardToRegular(String value)
            {
                return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
            }

            string wsName = null;
            foreach (Worksheet works in sourceWorkbook4.Worksheets)
            {
                Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));

                    if (startsWithddd == true)
                    {
                        wsName = works.Name.ToString();
                    }
            }

            Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);

...

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460288

You could use the VB.NET Like-Operator:

string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);  

Use CompareMethod.Text if you want to ignore the case.

You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.

Since it's part of the .NET framework and will always be, it's not a problem to use this class.

Upvotes: 38

nb.duong
nb.duong

Reputation: 1

public class Wildcard
{
    private readonly string _pattern;

    public Wildcard(string pattern)
    {
        _pattern = pattern;
    }

    public static bool Match(string value, string pattern)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end);
    }

    public static bool Match(string value, string pattern, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end, toLowerTable);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end);
    }

    public bool IsMatch(string str, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str, ref int start, ref int end)
    {
        if (_pattern.Length == 0) return false;
        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;
                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (str[si] == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && str[si] != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }

    public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
    {
        if (_pattern.Length == 0) return false;

        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;

                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    char c = toLowerTable[str[si]];
                    if (c == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && c != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    continue;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }
}

Upvotes: -1

geo le
geo le

Reputation: 7

C# Console application sample

Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py

using System;
using System.Text.RegularExpressions;           //Regex

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string cmdLine = String.Join(" ", args);

            bool bFileExtFlag = false;
            int argIndex = 0;
            Regex regex;
            foreach (string s in args)
            {
                //Search for the 1st occurrence of the "*.py" pattern
                regex = new Regex(@"(?s:.*)\056py", RegexOptions.IgnoreCase);
                bFileExtFlag = regex.IsMatch(s);
                if (bFileExtFlag == true)
                    break;
                argIndex++;
            };

            Console.WriteLine("Argument list: " + cmdLine);
            if (bFileExtFlag == true)
                Console.WriteLine("Found python filename: " + args[argIndex]);
            else
                Console.WriteLine("Python file with extension <.py> not found!");
        }


    }
}

Upvotes: -4

VirtualVDX
VirtualVDX

Reputation: 2381

Using of WildcardPattern from System.Management.Automation may be an option.

pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);

Visual Studio UI may not allow you to add System.Management.Automation assembly to References of your project. Feel free to add it manually, as described here.

Upvotes: 21

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627468

A wildcard * can be translated as .* or .*? regex pattern.

You might need to use a singleline mode to match newline symbols, and in this case, you can use (?s) as part of the regex pattern.

You can set it for the whole or part of the pattern:

X* = > @"X(?s:.*)"
*X = > @"(?s:.*)X"
*X* = > @"(?s).*X.*"
*X*YZ* = > @"(?s).*X.*YZ.*"
X*YZ*P = > @"(?s:X.*YZ.*P)"

Upvotes: 7

Avinash Raj
Avinash Raj

Reputation: 174844

*X*YZ* = string contains X and contains YZ

@".*X.*YZ"

X*YZ*P = string starts with X, contains YZ and ends with P.

@"^X.*YZ.*P$"

Upvotes: 5

Related Questions