AverageJoe9000
AverageJoe9000

Reputation: 338

How to find better algorithm for my prefix matcher algorithm

I was solving online problem and the task was something like this:

There are two arrays: numbers and prefixes.

  • Array numbers contains numbers: “+432112345”, “+9990”, “+4450505”
  • Array prefixes contains prefixes: “+4321”, “+43211”, “+7700”, “+4452”, “+4”

Find longest prefix for each number. If no prefix found for number, match with empty string.

For example:

  • “+432112345” matches with the longest prefix “+43211” (not +4321, cause 43211 is longer).
  • “+9990” doesn't match with anything, so empty string "".
  • “+4450505” matches with “+4” (“+4452” doesn’t match because of the 2).

I came up with the most straight forward solution where I loop through each number with each prefix. So each time new number, I check prefixes, if some prefix is longer than last one, I will change.

Map<String, String> numsAndPrefixes = new HashMap<>();

for (String number : A) {
    for (String prefix : B) {
        if (number.contains(prefix)) {
            // if map already contains this number, check for prefixes. 
            // if longer exists, switch longer one
            if (numsAndPrefixes.containsKey(number)) {
                int prefixLength = prefix.length();
                int currentLen = numsAndPrefixes.get(number).length();
                if (prefixLength > currentLen) {
                    numsAndPrefixes.put(number, prefix);
                }
            } else {
                numsAndPrefixes.put(number, prefix);
            }
        } else if (!number.contains(prefix) && !numsAndPrefixes.containsKey(number)){
            numsAndPrefixes.put(number, "");
        }
    }
}

So it will have two for loops. I see that each time I am doing the same job over and over, e.g checking for prefixes. It works, but it is slow. The problem is that I can’t come up with anything better.

Could someone explain how they would approach to find better algorithm?

And more general, how do you proceed if you have somewhat working solution and trying to find better one? What knowledge am I still missing?

Upvotes: 2

Views: 1515

Answers (3)

zomega
zomega

Reputation: 2316

You are using .contains(). You should use .startsWith(). It's a lot faster.

Then in your else if you are checking what you already checked in the if.

This is only one approach on how to improve the algorithm:

Sort the prefixes:

+43211 +4321 +4452 +4 +7700

What is good about this? Well, it will always find the longest prefix first. You can exit the loop and don't have to look for longer prefixes.

Arrays.sort(prefixes, new Comparator<String>() {
@Override
public int compare​(String o1, String o2) {
    return o1.startsWith(o2) ? 1 : o1.compareTo(o2);
}
});

Map<String, String> numsAndPrefixes = new HashMap<>();

for (String number: numbers) {
    numsAndPrefixes.put(number, "");
    for (String prefix: prefixes) {
        if (number.startsWith(prefix, 1)) {
            numsAndPrefixes.put(number, prefix);
            break;
        }
    }
}

But if your number starts with +1 and there is no prefix it will continue checking all the prefixes with +2 +3 +4 ... which are obviously not matching. (Issue 1)

Also if your number starts with +9 the prefix will be found very late. (Issue 2)

How to fix this? Well you can save the indices where +1 starts, +2 starts, ...:

In our prefix list:

0      1     2     3  4  5   (index)
+1233  +123  +2233 +2 +3 +4

+2 starts at index [2] and +3 starts at index [4]. So when you want to know the prefix for a number starting with +2 you only have to check elements [2] and [3]. This will both fix issue 1 and 2.

It would also be possible to store the indices for more digits (for example where +13 starts).

Upvotes: 0

ardenit
ardenit

Reputation: 3890

The data structure you need is trie.

  • Add all prefixes in trie
  • For each string S in numbers:
    • Start from the root of trie
    • For each character in S:
      • If there is a link from current node, associated with current character, go by this link to the next node
      • If there is no link, then you reached the longest prefix - prefix stored in the current node is the answer for S

This algorithm works in O(length(prefixes) + length(numbers))

Upvotes: 2

Andreas
Andreas

Reputation: 159086

I would implement this using a TreeSet and the floor(E e) method.

String[] numbers = { "+432112345", "+9990", "+4450505" };
String[] prefixes = { "+4321", "+43211", "+7700", "+4452", "+4" };

TreeSet<String> prefixSet = new TreeSet<>(Arrays.asList(prefixes));
for (String number : numbers) {
    String prefix = prefixSet.floor(number);
    while (prefix != null && ! number.startsWith(prefix))
        prefix = prefixSet.floor(prefix.substring(0, prefix.length() - 1));
    if (prefix == null)
        prefix = "";
    System.out.println(number + " -> " + prefix);
}

Output

+432112345 -> +43211
+9990 -> 
+4450505 -> +4

Upvotes: 3

Related Questions