maplemaple
maplemaple

Reputation: 1715

data structure for query substring and contains substring in a collection of strings

Recently, I met two use cases

  1. efficiently query whether a given string word is a substring of some elements of a collection of strings.

  2. efficiently query whether some elements of a collection of strings are substring of a given string word.

class StringSet {
    
    void add(String s);

    void contains(String s);

    void remove(String s);

    List<String> getAllSubstringsOfGivenWord(String s);
    
    List<String> getAllStringsContainsGivenWord(String s);

}

For example, the collection of String is {"abc", "dab", "bcd"}, then getAllStringsContainsGivenWord("ab") --> ["abc", "dab"], getAllSubstringsOfGivenWord("dabc") --> ["abc", "dab"]

Which kind of data structure suits for these kinds of use cases?

Upvotes: 2

Views: 160

Answers (1)

Fede
Fede

Reputation: 4016

What level of efficiency are you looking for? How complex can the add and remove operations be?

Are there any constraints or limits to the problem space? eg: the number or length of the words, the max length of the substring to look for, letter-casing sensitivity, alphabets to consider.

I'd suggest taking a look into Trie, Suffix Tree, and Generalized Suffix Tree.

Upvotes: 1

Related Questions