Mystic
Mystic

Reputation: 5124

Compare two strings by ignoring certain characters

I wonder if there is an easy way to check if two strings match by excluding certain characters in the strings. See example below.

I can easily write such a method by writing a regular expression to find the "wild card" characters, and replace them with a common character. Then compare the two strings str1 and str2. I am not looking for such implementations, but like to know whether there are any .Net framework classes that can take care of this. Seems like a common need, but I couldn't find any such method.

For example:

string str1 = "ABC-EFG";    
string str2 = "ABC*EFG";

The two strings must be declared equal.

Thanks!

Upvotes: 4

Views: 6027

Answers (5)

Johann Blais
Johann Blais

Reputation: 9469

I found myself having the same requirements, the solution I used was based on the String.Compare method:

String.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreSymbols)

Upvotes: 10

mechanical_meat
mechanical_meat

Reputation: 169304

Not sure if this helps:

The Damerau-Levenshtein distance is one of several algorithms dealing with fuzzy string searching.

The DLD between "ABC-EFG" and "ABC*EFG" is 1—"the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two characters."

Of course this algorithm would also return 1 for the two strings "ZBC-EFG" and "ABC-EFG"—possibly not what you are looking for.

An implementation of the DLD, in Python, from http://paxe.googlecode.com/svn/trunk/paxe/Lib/Installer.py :

def dist(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(0,lenstr1):
        for j in xrange(0,lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                d[(i-1,j)] + 1, # deletion
                d[(i,j-1)] + 1, # insertion
                d[(i-1,j-1)] + cost, # substitution
                )
            if i>1 and j>1 and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]

Upvotes: 5

Joe
Joe

Reputation:

You can of course test the regex w/out substitution:

[a-zA-z]{3}.[a-zA-z]{3}

Seems like a common use for regex, so why the avoidance?

Upvotes: 1

Travis Collins
Travis Collins

Reputation: 4020

Sorry but I think either regex, or replacing the "wildcard" characters with a common character are going to be your best solution. Basically, the answers that you stated you didn't want to receive.

Upvotes: 1

Sean Bright
Sean Bright

Reputation: 120644

No, there is nothing in the framework itself that can do this.

Upvotes: 0

Related Questions