Marco Toniut
Marco Toniut

Reputation: 588

Check if two strings share the same pattern of repeated characters

Is there an efficient Regex to assert that two string share the same pattern of repeated characters.

("tree", "loaa") => true
("matter", "essare") => false
("paper", "mime") => false
("acquaintance", "mlswmodqmdlp") => true
("tree", "aoaa") => false

Event if it's not through Regex, I'm looking for the most efficient way to perform the task

Upvotes: 15

Views: 5606

Answers (5)

jliu1999
jliu1999

Reputation: 12

just came across this same problem. and i wrote a piece of python code for it. it's rather straightforward, no extra modules to import. the basic idea is to translate the two given strings to a new pattern string, respectively, using the relationship between ascii characters and their corresponding numerical values. finally compare the two pattern strings.

def SamePattern(s1, s2):
  i = j = 97
  p1 = p2 = ""

  for index1, l1 in enumerate(s1):
    if l1 not in s1[0:index1]:
      p1 += chr(i)
      i += 1
    else:
      p1 += chr(97 + s1.index(l1))

  for index2, l2 in enumerate(s2): 
    if l2 not in s2[0:index2]:
      p2 += chr(j)
      j += 1
    else:
      p2 += chr(97 + s2.index(l2))
      
  if p1 == p2:
    return True
  else:
    return False


Upvotes: 0

JerKimball
JerKimball

Reputation: 16894

Only because I love LINQ: :)

void Main()
{
    Console.WriteLine(Map("tree") == Map("loaa"));
    Console.WriteLine(Map("matter") == Map("essare"));
    Console.WriteLine(Map("paper") == Map("mime"));
    Console.WriteLine(Map("acquaintance") == Map("mlswmodqmdlp"));
    Console.WriteLine(Map("tree") == Map("aoaa"));  
}

public string Map(string input)
{
    var seen = new Dictionary<char,int>();
    var index = 0;
    return string.Join(
      string.Empty, 
      input.Select(c =>seen.ContainsKey(c) ? seen[c] : seen[c] = index++));
}

Upvotes: 1

avishayp
avishayp

Reputation: 706

EDIT: Accepted code is now correct. This one will linger here as an alternative (which is less good in almost any sense).

    private static List<int> FindIndices(string str, char c, int ind)
    {
        var retval = new List<int>();
        int last = ind, temp;
        while (0 < (temp = str.IndexOf(c, last)))
        {
            retval.Add(temp);
            last = temp + 1;
        }           
        return retval;
    }

    public static int[] CanonicalForm(string s)
    {
        string table = String.Empty;
        var res = new int[s.Length];
        int marker = 0;
        int lastInd;

        for(int i=0; i < s.Length-1; ++i)
        {
            if (table.Contains(s[i]))
                continue;

            table += s[i];              
            lastInd = i+1;

            if (s.IndexOf(s[i], lastInd) > 0)
                res[i] = ++marker;
            else
                continue;

            foreach (var v in FindIndices(s, s[i], lastInd))
                res[v] = marker;
        }
        return res;
    }

And the comparison:

    public static bool ComparePatterns(string s1, string s2)
    {
        return ( (s1.Length == s2.Length) && CanonicalForm(s1).SequenceEqual(CanonicalForm(s2)) );
    }

So the point is to build a canonical form which can be later compared. This is not especially smart but does give correct results.

Upvotes: 1

Martin Ender
Martin Ender

Reputation: 44259

The easiest way is probably to walk through both strings manually at the same time and build up a dictionary (that matces corresponding characters) while you are doing it:

if(input1.Length != input2.Length)
    return false;
var characterMap = new Dictionary<char, char>();
for(int i = 0; i < input1.Length; i++)
{
    char char1 = input1[i];
    char char2 = input2[i];
    if(!characterMap.ContainsKey(char1))
    {
        if (characterMap.ContainsValue(char2))
            return false;
        characterMap[char1] = char2;
    }
    else
    {
        if(char2 != characterMap[char1])
            return false;
    }
}
return true;

In the same manner you could construct a regex. This is certainly not more efficient for a single comparison, but it might be useful if you want to check one repetition pattern against multiple strings in the future. This time we associate characters with their back-references.

var characterMap = new Dictionary<char, int>();
string regex = "^";
int nextBackreference = 1;
for(int i = 0; i < input.Length; i++)
{
    char character = input[i];
    if(!characterMap.ContainsKey(character))
    {
        regex += "(.)";
        characterMap[character] = nextBackreference;
        nextBackreference++;
    }
    else
    {
        regex += (@"\" + characterMap[character]);
    }
}
regex += "$";

For matter it will generate this regex: ^(.)(.)(.)\3(.)(.)$. For acquaintance this one: ^(.)(.)(.)(.)\1(.)(.)(.)\1\6\2(.)$. If could of course optimize this regular expression a bit afterwards (e.g. for the second one ^(.)(.)..\1.(.).\1\3\2$), but in any case, this would give you a reusable regex that checks against this one specific repetition pattern.

EDIT: Note that the given regex solution has a caveat. It allows mapping of multiple characters in the input string onto a single character in the test strings (which would contradict your last example). To get a correct regex solution, you would have to go a step further to disallow characters already matched. So acquaintance would have to generate this awful regular expression:

^(.)(?!\1)(.)(?!\1|\2)(.)(?!\1|\2|\3)(.)\1(?!\1|\2|\3|\4)(.)(?!\1|\2|\3|\4|\5)(.)(?!\1|\2|\3|\4|\5|\6)(.)\1\6\2(?!\1|\2|\3|\4|\5|\6|\7)(.)$

And I cannot think of an easier way, since you cannot use backreferences in (negated) character classes. So maybe, if you do want to assert this as well, regular expressions are not the best option in the end.

Disclaimer: I am not really a .NET guru, so this might not be the best practice in walking through arrays in building up a dictionary or string. But I hope you can use it as a starting point.

Upvotes: 12

Sid Holland
Sid Holland

Reputation: 2921

I don't know how to do it using a regex, but in code I would run through both strings one character at a time, comparing as I go and building a comparison list:

t = l
r = o
e = a
etc.

Prior to adding each comparison, I'd check to see if a character from the first string already existed in the list. If the corresponding character from the second string does not match the comparison list then the string patterns don't match.

Upvotes: 1

Related Questions