Reputation: 21415

Longest common prefix based on elements

I have an array of string elements in an array:

["000", "1110", "01", "001", "110", "11"]

For an element in the array,

I want to find the previous element index with longest common prefix.
If I have multiple matching elements then select the nearest element index.
If none found then just select previous index.

Example:

["000", "1110", "01", "001", "110", "11"]

Output:
[0,1,1,1,2,5]

a) "000" - output is 0, because we do not have any previous elements.
b) "1110" - output is 1, no previous element with longest prefix so select previous index.
c) "01" - output is 1,"000" has longest prefix, so its index is 1.
d) "001" - output is 1, "000" has longest prefix, so its index is 1.
e) "110" - output is 2,  "1110" has longest prefix, so its index 2.
f) "11" - output 5, "110" is most nearest element with longest prefix so its index 5.

I am not able to understand what approach I need to take for this task. Can you please help me.

Upvotes: 3

Answers (3)

Hawk

Reputation: 2142

Naive solution based on the former question

commonPrefix is supposed to (according to the comment) the longest prefix in the array up to index n. What does that mean? You need to calculate all prefixes and pick the longest.

static String commonPrefix(String arr[], int n) {
    String longestPrefix = "";
    for (int i = 0; i < n; i++) {
        final String currentPrefix = commonPrefixUtil(arr[i], arr[n]);
        if (currentPrefix.length() > longestPrefix.length()) {
            longestPrefix = currentPrefix;
        }
    }
    return longestPrefix;
}

So that would yield "00" for arr = ["000", "1110", "01", "001", "110", "11"]; n = 3.

Now that we have the longest prefix, what? We need to find the closest index to n that starts with that prefix...

static int closestIndex(String[] arr, String longestPrefix, int n) {
    for (int i = n - 1; i >= 0; i--) {
        if (arr[i].startsWith(longestPrefix)) {
            return i + 1; // + 1 because the solution wants starting index with 1
        }
    }
    return 0;
}

How to put it together? Just call the two methods for each input

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };
    int[] output = new int[words.length];

    for (int i = 0; i < words.length; i++) {
        final String longestPrefix = commonPrefix(words, i);
        output[i] = closestIndex(words, longestPrefix, i);
    }

    System.out.println(Arrays.toString(output));
}

You have deleted your commonPrefixUtil implementation that worked from the question, so I add my own:

static String commonPrefixUtil(String str1, String str2) {
    int shorterStringLength = Math.min(str1.length(), str2.length());
    int length = 0;
    for (; length < shorterStringLength; length++) {
        if (str1.charAt(length) != str2.charAt(length)) {
            break;
        }
    }
    return str1.substring(0, length);
}

Optimalized solution

I created a new solution using dynamic programming with tabulation (if I understand correctly), that is I use a hashmap of already all prefixes pointing to the indexes of the words where the prefix is from. The value of the Map is a sorted tree, so it can be really easily determined which word with the common prefix is closest to the current index. The HashMap ensures constant-time operations and the TreeSet guarantees log(n) time cost.

Explained more simply, I process the first letter of all words, and then the next etc. During this process I memorise where all prefix substrings are, while they are automatically sorted. I stop the loop after processing the last letter of the longest word.

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };

    int[] result = new int[words.length];
    final int firstWordLength = words.length > 0 ? words[0].length() : 8;
    // prefix -> [indexes of prefix occurrence]
    HashMap<String, TreeSet<Integer>> prefixes = new HashMap<>(words.length * (firstWordLength + 1) * 2);
    int wordLength = 1;
    boolean isUpdatedResult;
    do { // O(k)
        isUpdatedResult = false;
        for (int wordIdx = 0; wordIdx < words.length; wordIdx++) { // O(n)
            if (words[wordIdx].length() < wordLength) {
                continue;
            }
            final String currentPrefix = words[wordIdx].substring(0, wordLength); // Java >= 7 update 6 ? O(k) : O(1)
            final TreeSet<Integer> prefixIndexes = prefixes.get(currentPrefix); // O(1)
            if (prefixIndexes != null) {
                // floor instead of lower, because we have put `wordIdx + 1` inside
                final Integer closestPrefixIndex = prefixIndexes.floor(wordIdx); // O(log n)
                if (closestPrefixIndex != null) {
                    result[wordIdx] = closestPrefixIndex;
                    isUpdatedResult = true;
                }
            }
            // take the previous index for the result if no match
            if (result[wordIdx] == 0) {
                result[wordIdx] = wordIdx;
            }
            final TreeSet<Integer> newPrefixIndexes = prefixes.computeIfAbsent(currentPrefix, p -> new TreeSet<>()); // O(1)
            // the result index must be + 1
            newPrefixIndexes.add(wordIdx + 1); // O(log n)
        }
        wordLength++;
    } while (isUpdatedResult);

    System.out.println(Arrays.toString(result));
}

I have marked all the operations with their big O time complexity. n is the number of words in the input array, i.e. words.length and k is the length of the longest word. According to Jon Skeet's post the time complexity of substring has changed in Java 7 update 6 to linear.

So we can calculate:

O(k) * O(n) * (O(log n) + O(k))

Hopefully, the code is understandable and I calculated the complexity correctly.

Upvotes: 1

David Eisenstat

Reputation: 65458

Using a trie makes it pretty easy to find the longest common prefix so far in time linear in the total number of characters in the input. In Python (sorry):

import collections


class Trie:
    def __init__(self):
        self._children = collections.defaultdict(Trie)
        self._previous_index = 0

    # Find the longest prefix of word that appears in the trie,
    # return the value of _previous_index at that node.
    def previous_index(self, word):
        node = self
        for letter in word:
            child = node._children.get(letter)
            if child is None:
                break
            node = child
        return node._previous_index

    # Ensure that each of the prefixes of word exists in the trie.
    # At each node corresponding to a prefix, set its _previous_index to index.
    def insert(self, index, word):
        self._previous_index = index
        node = self
        for letter in word:
            node = node._children[letter]
            node._previous_index = index


def longest_common_prefix_indexes(words):
    trie = Trie()
    for index_minus_one, word in enumerate(words):
        print(trie.previous_index(word))
        trie.insert(index_minus_one + 1, word)


longest_common_prefix_indexes(["000", "1110", "01", "001", "110", "11"])

Upvotes: 1

Ela Łabaj

Reputation: 74

I think this might work for you

This is funcion that returns the length of the longest common prefix of two Strings:

int commonPrefixLength(String s, String t) {
    if (s.length() == 0 || t.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;
    int shorter = Math.min(s.length(), t.length());

    for (int i = 0; i < shorter; i++) {
        if (s.charAt(i) != t.charAt(i)) {
            break;
        }
        commonPrefixLength++;
    }

    return commonPrefixLength;
}

For every string in array you can check previous strings and choose the one with to longest common prefix:

    //indexing from 1
    String[] strings = new String[] {"", "000", "1110", "01", "001", "110", "11"};

    for (int i = 1; i < strings.length; i++) {
        int longestCommonPrefix = 0;
        int answer = 0;

        //for every previous string
        for (int j = i - 1; j > 0; j--) {
            int commonPrefix = commonPrefixLength(strings[j], strings[i]);
            if (commonPrefix > longestCommonPrefix) {
                longestCommonPrefix = commonPrefix;
                answer = j;
            }
        }

        //no common prefix found
        if (answer == 0) {
            answer = i - 1;
        }

        System.out.println(answer);
    }

Upvotes: 0

Longest common prefix based on elements

Answers (3)

Naive solution based on the former question

Optimalized solution

Related Questions