755
755

Reputation: 3091

Algorithm - check if any string in an array of strings is a prefix of any other string in the same array

I want to check if any string in an array of strings is a prefix of any other string in the same array. I'm thinking radix sort, then single pass through the array.

Anyone have a better idea?

Upvotes: 2

Views: 2440

Answers (3)

Aravind
Aravind

Reputation: 3199

To achieve a time complexity close to O(N2): compute hash values for each string.

Come up with a good hash function that looks something like:

  • A mapping from [a-z]->[1,26]
  • A modulo operation(use a large prime) to prevent overflow of integer

So something like "ab" gets computed as "12"=1*27+ 2=29

A point to note:

Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.

Steps:

  1. Compute hash value for each string
  2. Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
  3. Report answer

Upvotes: 1

akalenuk
akalenuk

Reputation: 3845

I think, radix sort can be modified to retrieve prefices on the fly. All we have to do is to sort lines by their first letter, storing their copies with no first letter in each cell. Then if the cell contains empty line, this line corresponds to a prefix. And if the cell contains only one entry, then of course there are no possible lines-prefices in it.

Here, this might be cleaner, than my english:

lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]

line_lines = [(line, line) for line in lines]

def find_sub(line_lines):
    cells = [ [] for i in range(26)]
    for (ine, line) in line_lines:
        if ine == "":
            print line
        else:
            index = ord(ine[0]) - ord('a')
            cells[index] += [( ine[1:], line )]
    for cell in cells:
        if len(cell) > 1:
            find_sub( cell )

find_sub(line_lines)

Upvotes: 1

AMADANON Inc.
AMADANON Inc.

Reputation: 5929

If you sort them, you only need to check each string if it is a prefix of the next.

Upvotes: 1

Related Questions