lllook
lllook

Reputation: 749

Counting substrings from given set of words

i have a set of strings (dictionary) and a string T, i have to count how many times i can build T from words in my dictionary

for example

dictionary contains: hello world llo he

and string T "helloworld"

output should be 2 because "hellowold" can be built from hello+world, he+llo+world

is there any effective algorithm for doing this?

Upvotes: 1

Views: 51

Answers (2)

jjm
jjm

Reputation: 6198

Here's a quick implementation in python:

from collections import defaultdict

def count_constructions( words, string ):
    # First we're going to make a map with 
    # positions mapped to lists of words starting
    # at that position in the string
    words_at_index = defaultdict( list )
    for word in words:    
        i = string.find(word)
        while i >= 0:
            words_at_index[i].append(word)
            i = string.find(word, i + 1)
    # I know there's a more pythonic way to do this, 
    # but the point here is to be able to inc count within
    # the auxilliary function
    count = [ 0 ]

    # This will find all of the ways to cover the remaining string
    # starting at start
    def recurse( start ):
        for w in words_at_index[start]:
            # w matches from string[start] to string[next_start]
            next_start = start + len(w)
            # see if we've covered the whole thing.        
            if next_start == len(string):
                count[0] += 1
                # we could also emit the words forming the string here
            else: 
                # otherwise, count the times we can cover it from
                # next_start on
                recurse(next_start)

    recurse(0)
    return count[0]


dictionary = [ 'hello', 'world', 'llo', 'he' ]
word = "helloworld"

print( count_constructions( dictionary, word ) )

Upvotes: 1

Lajos Arpad
Lajos Arpad

Reputation: 76551

I would start by getting a sub-set from your dictionary, containing only the words which might be part of the word you are searching for. Then, with the remaining words you can do a backtrack implementation, it should not use too much resources, as the set you will run your backtrack on will be very small.

Upvotes: 0

Related Questions