MadTest
MadTest

Reputation: 133

Matching the occurrence and pattern of characters of String2 in String1

I was asked this question in a phone interview for summer internship, and tried to come up with a n*m complexity solution (although it wasn't accurate too) in Java.

I have a function that takes 2 strings, suppose "common" and "cmn". It should return True based on the fact that 'c', 'm', 'n' are occurring in the same order in "common". But if the arguments were "common" and "omn", it would return False because even though they are occurring in the same order, but 'm' is also appearing after 'o' (which fails the pattern match condition)

I have worked over it using Hashmaps, and Ascii arrays, but didn't get a convincing solution yet! From what I have read till now, can it be related to Boyer-Moore, or Levenshtein Distance algorithms?

Hoping for respite at stackoverflow! :)

Edit: Some of the answers talk about reducing the word length, or creating a hashset. But per my understanding, this question cannot be done with hashsets because occurrence/repetition of each character in first string has its own significance. PASS conditions- "con", "cmn", "cm", "cn", "mn", "on", "co". FAIL conditions that may seem otherwise- "com", "omn", "mon", "om". These are FALSE/FAIL because "o" is occurring before as well as after "m". Another example- "google", "ole" would PASS, but "google", "gol" would fail because "o" is also appearing before "g"!

Upvotes: 8

Views: 2881

Answers (8)

FourOfAKind
FourOfAKind

Reputation: 2418

I tried it myself in a different way. Just sharing my solution.

public class PatternMatch {

public static boolean matchPattern(String str, String pat) {

    int slen = str.length();
    int plen = pat.length();
    int prevInd = -1, curInd;
    int count = 0;

    for (int i = 0; i < slen; i++) {

        curInd = pat.indexOf(str.charAt(i));


        if (curInd != -1) {

            if(prevInd == curInd)
                continue;
            else if(curInd == (prevInd+1))
                count++;
            else if(curInd == 0)
                count = 1;
            else count = 0;

            prevInd = curInd;
        }


        if(count == plen)
            return true;

    }

    return false;
}

public static void main(String[] args) {

    boolean r = matchPattern("common", "on");
    System.out.println(r);
}

}

Upvotes: 0

Adrian Regan
Adrian Regan

Reputation: 2250

I think this one is not a test of your computer science fundamentals, more what you would practically do within the Java programming environment.

You could construct a regular expression out of the second argument, i.e ...

omn -> o.*m[^o]*n

... and then test candidate string against this by either using String.matches(...) or using the Pattern class.

In generic form, the construction of the RegExp should be along the following lines.

exp -> in[0].* + for each x : 2 -> in.lenght { (in[x-1] + [^in[x-2]]* + in[x]) }

for example:

demmn -> d.*e[^d]*m[^e]*m[^m]*n

Upvotes: 0

raymi
raymi

Reputation: 1962

I think it's quite simple. Run through the pattern and fore every character get the index of it's last occurence in the string. The index must always increase, otherwise return false. So in pseudocode:

index = -1
foreach c in pattern
    checkindex = string.lastIndexOf(c)
    if checkindex == -1                   //not found
        return false
    if checkindex < index
        return false
    if string.firstIndexOf(c) < index     //characters in the wrong order
        return false
    index = checkindex
return true

Edit: you could further improve the code by passing index as the starting index to the lastIndexOf method. Then you would't have to compare checkindex with index and the algorithm would be faster.

Updated: Fixed a bug in the algorithm. Additional condition added to consider the order of the letters in the pattern.

Upvotes: 4

MadTest
MadTest

Reputation: 133

I had myself done this question in an inefficient manner, but it does give accurate result! I would appreciate if anyone can make out an an efficient code/algorithm from this!

Create a function "Check" which takes 2 strings as arguments. Check each character of string 2 in string 1. The order of appearance of each character of s2 should be verified as true in S1.

  1. Take character 0 from string p and traverse through the string s to find its index of first occurrence.
  2. Traverse through the filled ascii array to find any value more than the index of first occurrence.
  3. Traverse further to find the last occurrence, and update the ascii array
  4. Take character 1 from string p and traverse through the string s to find the index of first occurence in string s
  5. Traverse through the filled ascii array to find any value more than the index of first occurrence. if found, return False.
  6. Traverse further to find the last occurrence, and update the ascii array

As can be observed, this is a bruteforce method...I guess O(N^3)

public class Interview
{
    public static void main(String[] args)
{
    if (check("google", "oge"))
        System.out.println("yes");
    else System.out.println("sorry!");
}

 public static boolean check (String s, String p) 
{   

     int[] asciiArr =  new int[256];    
     for(int pIndex=0; pIndex<p.length(); pIndex++) //Loop1 inside p
     {
        for(int sIndex=0; sIndex<s.length(); sIndex++) //Loop2 inside s
        {
            if(p.charAt(pIndex) == s.charAt(sIndex))    
            {
                asciiArr[s.charAt(sIndex)] = sIndex; //adding char from s to its Ascii value

                for(int ascIndex=0; ascIndex<256; )     //Loop 3 for Ascii Array
                {
                    if(asciiArr[ascIndex]>sIndex)           //condition to check repetition
                    return false;
                    else ascIndex++;
                }
            }
        }
     }
    return true;
}
}

Upvotes: 1

topgun_ivard
topgun_ivard

Reputation: 8594

I would consider this as one of the worst pieces of code I have ever written or one of the worst code examples in stackoverflow...but guess what...all your conditions are met!
No algorithm could really fit the need, so I just used bruteforce...test it out...
And I could just care less for space and time complexity...my aim was first to try and solve it...and maybe improve it later!

public class SubString {

    public static void main(String[] args) {
        SubString ss = new SubString();
        String[] trueconditions = {"con", "cmn", "cm", "cn", "mn", "on", "co" };
        String[] falseconditions = {"com", "omn", "mon", "om"};

        System.out.println("True Conditions : ");
        for (String str : trueconditions) {
            System.out.println("SubString? : " + str + " : " + ss.test("common", str));
        }
        System.out.println("False Conditions : ");
        for (String str : falseconditions) {
            System.out.println("SubString? : " + str + " : " + ss.test("common", str));
        }
        System.out.println("SubString? : ole : " + ss.test("google", "ole"));
        System.out.println("SubString? : gol : " + ss.test("google", "gol"));
    }

    public boolean test(String original, String match) {
        char[] original_array = original.toCharArray();
        char[] match_array = match.toCharArray();

        int[] value = new int[match_array.length];
        int index = 0;
        for (int i = 0; i < match_array.length; i++) {
            for (int j = index; j < original_array.length; j++) {
                if (original_array[j] != original_array[j == 0 ? j : j-1] && contains(match.substring(0, i), original_array[j])) {

                    value[i] = 2;
                } else {
                    if (match_array[i] == original_array[j]) {
                        if (value[i] == 0) {
                            if (contains(original.substring(0, j == 0 ? j : j-1), match_array[i])) {
                                value[i] = 2;
                            } else {
                                value[i] = 1;
                            }
                        }
                        index = j + 1;
                    }
                }
            }
        }
        for (int b : value) {
            if (b != 1) {
                return false;
            }
        }
        return true;
    }

    public boolean contains(String subStr, char ch) {
        for (char c : subStr.toCharArray()) {
            if (ch == c) {
                return true;
            }
        }
        return false;
    }
}

-IvarD

Upvotes: 0

NirmalGeo
NirmalGeo

Reputation: 771

An excellent question and couple of hours of research and I think I have found the solution. First of all let me try explaining the question in a different approach.

Requirement:

Lets consider the same example 'common' (mainString) and 'cmn'(subString). First we need to be clear that any characters can repeat within the mainString and also the subString and since its pattern that we are concentrating on, the index of the character play a great role to. So we need to know:

  • Index of the character (least and highest)

Lets keep this on hold and go ahead and check the patterns a bit more. For the word common, we need to find whether the particular pattern cmn is present or not. The different patters possible with common are :- (Precedence apply )

  • c -> o
  • c -> m
  • c -> n
  • o -> m
  • o -> o
  • o -> n
  • m -> m
  • m -> o
  • m -> n
  • o -> n

At any moment of time this precedence and comparison must be valid. Since the precedence plays a huge role, we need to have the index of each unique character Instead of storing the different patterns.

Solution

First part of the solution is to create a Hash Table with the following criteria :-

  1. Create a Hash Table with the key as each character of the mainString
  2. Each entry for a unique key in the Hash Table will store two indices i.e lowerIndex and higherIndex
  3. Loop through the mainString and for every new character, update a new entry of lowerIndex into the Hash with the current index of the character in mainString.
  4. If Collision occurs, update the current index with higherIndex entry, do this until the end of String

Second and main part of pattern matching :-

  1. Set Flag as False
  2. Loop through the subString and for every character as the key, retreive the details from the Hash.
  3. Do the same for the very next character.
  4. Just before loop increment, verify two conditions

    If highestIndex(current character) > highestIndex(next character) Then
       Pattern Fails, Flag <- False, Terminate Loop
       // This condition is applicable for almost all the cases for pattern matching
    
    Else If lowestIndex(current character) > lowestIndex(next character) Then
       Pattern Fails, Flag <- False, Terminate Loop
       // This case is explicitly for cases in which patterns like 'mon' appear
    
  5. Display the Flag

N.B : Since I am not so versatile in Java, I did not submit the code. But some one can try implementing my idea

Upvotes: 2

Emil
Emil

Reputation: 13799

public class StringPattern {
    public static void main(String[] args) {
        String inputContainer = "common";
        String inputContainees[] = { "cmn", "omn" };
        for (String containee : inputContainees)
            System.out.println(inputContainer + " " + containee + " "
                    + containsCommonCharsInOrder(inputContainer, containee));

    }

    static boolean containsCommonCharsInOrder(String container, String containee) {
        Set<Character> containerSet = new LinkedHashSet<Character>() {
            // To rearrange the order
            @Override
            public boolean add(Character arg0) {
                if (this.contains(arg0))
                    this.remove(arg0);
                return super.add(arg0);
            }
        };
        addAllPrimitiveCharsToSet(containerSet, container.toCharArray());
        Set<Character> containeeSet = new LinkedHashSet<Character>();
        addAllPrimitiveCharsToSet(containeeSet, containee.toCharArray());

        // retains the common chars in order
        containerSet.retainAll(containeeSet);
        return containerSet.toString().equals(containeeSet.toString());

    }

    static void addAllPrimitiveCharsToSet(Set<Character> set, char[] arr) {
        for (char ch : arr)
            set.add(ch);
    }

}

Output:

common cmn true
common omn false

Upvotes: 0

Mike Samuel
Mike Samuel

Reputation: 120576

Isn't it doable in O(n log n)?

Step 1, reduce the string by eliminating all characters that appear to the right. Strictly speaking you only need to eliminate characters if they appear in the string you're checking.

/** Reduces the maximal subsequence of characters in container that contains no
 * character from container that appears to the left of the same character in
 * container.  E.g. "common" -> "cmon", and "whirlygig" -> "whrlyig".
 */
static String reduceContainer(String container) {
  SparseVector charsToRight = new SparseVector();  // Like a Bitfield but sparse.
  StringBuilder reduced = new StringBuilder();
  for (int i = container.length(); --i >= 0;) {
    char ch = container.charAt(i);
    if (charsToRight.add(ch)) {
      reduced.append(ch);
    }
  }
  return reduced.reverse().toString();
}

Step 2, check containment.

static boolean containsInOrder(String container, String containee) {
  int containerIdx = 0, containeeIdx = 0;
  int containerLen = container.length(), containeeLen == containee.length();
  while (containerIdx < containerLen && containeeIdx < containeeLen) {
    // Could loop over codepoints instead of code-units, but you get the point...
    if (container.charAt(containerIdx) == containee.charAt(containeeIdx)) {
      ++containeeIdx;
    }
    ++containerIdx;
  }
  return containeeIdx == containeeLen;
}

And to answer your second question, no, Levenshtein distance won't help you since it has the property that if you swap the arguments the output is the same, but the algo you want does not.

Upvotes: 0

Related Questions