Sankalp
Sankalp

Reputation: 2824

Least number characters which can make a set of strings unique

What is the least number characters (starting from 1st character) which can make a set of strings unique.

For example, set of strings:

{
    'january',
    'february',
    'march',
    'april',
    'may',
    'june',
    'july'
}

Now, we cannot use just 1st character since 'j' is both in 'june' as well as 'july' (also, 'm' is in 'march' and 'may'). We also cannot use 1st 2 characters as 'ma' is both in 'march' and 'may'.

But, we can use 1st 3 characters!

What can be an optimal algorithm to return this number (besides an obvious brute force) ?

Upvotes: 1

Views: 208

Answers (2)

Aki Suihkonen
Aki Suihkonen

Reputation: 20037

You can sort the data and compare the index of the first character that differs between adjacent entries. This has complexity of O(N log N) or the complexity of sorting.

If you can incorporate the index calculation into the compare function, you can get the result as a side effect: the result should be the maximum index encountered by the compare function.

As Steve Jessop commented, it may be that there's no solution to the problem. One could indeed first calculate the minimum length of the entries. Another option, if one can freely implement a comparison function, is to return -1 if the comparison function has ever encountered an end of string.

int global_max = 0;
int compare(const char *a, const char *b) {
    int c=0;
    int result;  // result of comparison -- set as zero if an error has occured
    if (global_max < 0) return 0;
    do {
       if (*a==0 || *b==0) { global_max = -1; return 0; }
       c++;
    } while ((result = (*a++ - *b++)) == 0);
    if (c>global_max) global_max = c;
    return result;
}

Upvotes: 2

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70939

First thing you can optimize is to do a binary search over the size of prefix needed. The function is monotonous- if a given number of chars n is enough, then any value larger than n will also do.

Second - you may compute rolling hashes for each string so that you can get the hash code for a given prefix of each string in constant time. Thus you will have to verify numbers(hash codes) for uniqueness instead of a string which of course is faster. I suggest a rolling hash like in rabin-karp.

Upvotes: 2

Related Questions