Reputation: 31
Essentially, I have a Set of words, about 250,000 of them and want to be able to return a list of which ones are found in a given string.
eg. input string is 'APPLEASEDITION', I want to return
[APP,APPLE,PLEA, PLEAS,PLEASE,PLEASED,lEA,LEAS,LEASE,LEASED,EA,EAS,EASE,EASED,AS,SEDITION,EDITION,IT,TI,ON]
I came up with this code, which works faster than the method mentioned above for shorter input strings (up to 15 characters), but doubles in execution time with each added letter:
const findWords = (instring, solutions = null) => {
if (!solutions) solutions = new Set();
if (!instring) {
return new Set();
}
if (words[instring]) {
solutions.add(instring);
}
const suffix = instring.slice(1);
const prefix = instring.slice(0, instring.length - 1);
if (!solutions.has(prefix))
solutions = new Set([...solutions, ...findWords(prefix, solutions)]);
if (!solutions.has(suffix))
solutions = new Set([...solutions, ...findWords(suffix, solutions)]);
return solutions;
};
Wondering if anyone can help me out optimizing the code?
Edit:
I made a different solution, it works much better
const getAllSubstrings = (str) => {
let result = [];
for (let i = 0; i < str.length; i++) {
for (let j = i + 1; j < str.length + 1; j++) {
result.push(str.slice(i, j));
}
}
return result;
}
const findWords = (instring) => {
const solutions = []
let subs = getAllSubstrings(instring)
for (let sub of subs) {
if (words[sub])
solutions.push(sub)
}
return solutions
}
Still open to possible improvements, but this works well enough for my use case
Upvotes: 2
Views: 135
Reputation: 10226
As it stands your logic assumes your input starts or ends with the phrase, but doesn't consider words in the middle - you'll need to generate permutations
Convert your dictionary to a hash where the words are keys - O(n) => O(1) - you can check if possible words are in the dictionary by checking dictionary[possibleWord]
You could convert your array of dictionary words into a binary search tree or a trie - there might be a performance benefit to converting your source text to a collection of BSTs/Tries, where each one represents a possible word/permutation, and then comparing BSTs/Tries rather than strings, but I'm not sure how that'd be faster than string comparison at the moment.
You can limit the length to the max length of a given permutation to the words in your dictionary. You'll end up with a lot of permutations, but possibly less than you have currently.
As the comments state you may want to do this server side for more power/in a language more efficient than JS, or using WASM.
Some javascript libraries that have binary search tree tools:
Upvotes: 2