Sara Ree
Sara Ree

Reputation: 3543

Remove an array of words from given string in the most efficient way

I want the most efficient (in case of speed) solution to remove array of given words from the given string:

So far I have this not working solution:

const excludeWordList = ['the', 'in', 'a', 'an'];

run("the wall into")
run("paintings covered the wall another words into this")

function run(speech) {

    for(let a = 0; a < excludeWordList.length; a++) {
        speech = speech.replaceAll(excludeWordList[a], '');
    }
    console.log(speech);
}

As you see the code has three major issues:

  1. It removes characters inside the words not just the single words

  2. The result is not a trimmed string we have extra spaces inside words of the result too

  3. The code is not the most efficient way I think!!!, because I need to loop through all the excludeWordList array.

I wrote my function as my and as you see the Gainza function is the most efficient function in this case:

enter image description here

Upvotes: 0

Views: 198

Answers (5)

customcommander
customcommander

Reputation: 18901

The problem with a solution using whitespaces as word separator is that it will most likely fail when you use punctuation for example e.g.

'the, Beattles'.split(' ').includes('the')
//=> false

Instead you should use \b (word boundary):

const excludes = ['the', 'in', 'a', 'an'];
const re = new RegExp('\\b(?:'+excludes.join('|')+')\\b', 'g');

console.log("the.wall.into".replace(re, ''));
console.log("paintings covered the, wall another words into this".replace(re, ''));

Upvotes: 1

Kinglish
Kinglish

Reputation: 23654

Here's a way that doesn't loop, replaces all excluded words with a single regex, and finished up with a trim and extra space cleanup. This should be the fasted method. You can map the array into a Regex and preserve word boundaries \b, which have to be escaped when building it dynamically

const excludeWordList = ['the', 'in', 'a', 'an'];
const reg = new RegExp(excludeWordList.map(w => `\\b${w}\\b`).join('|'), 'g')

run("the wall into")
run("paintings covered the wall another words into this")

function run(speech) {
  speech = speech.replaceAll(reg, '').trim().replace(/\s\s+/g, ' ');
  console.log(speech);
}

Upvotes: 2

Tushar Shahi
Tushar Shahi

Reputation: 20451

You can try using split() and filter().

Edit: indexOf() complexity is O(N) because of linear search. Since we have a fixed set of words we want to exclude, converting to a set is ideal.

new Set() is also O(N), but since it is being done only once and your run() will be called more often it makes sense here. With set, .has() has O(1) complexity.

const excludeWordList = ['the', 'in', 'a', 'an'];

const excludeWordSet = new Set(excludeWordList);

run("the wall into")
run("paintings covered the wall another words into this")

function run(speech) {
speech = speech.split(' ').filter((a) => {
return (!excludeWordSet.has(a))
}).join(' ');
  
console.log(speech);
}

filter() still has O(N) complexity. join() has O(N). so this is still O(N^2), same as your initial attempt.

Upvotes: 1

Aviad
Aviad

Reputation: 3584

I'd use the a filter & set approach to minimize computation time (instead of includes or indexOf that iterate the whole array)

const excluded = new Set(['the', 'in', 'a', 'an']);


function run(speech) {
    return speech.split(' ')
           .filter(word => !excluded.has(word))
           .join(' ');
}


run("the wall into")
run("paintings covered the wall another words into this")

Upvotes: 2

const excludeWordList = ['the', 'in', 'a', 'an'];

run("the wall into")
run("paintings covered the wall another words into this")

function run(speech) {

    const result = speech.split(' ').filter(word=>!excludeWordList.includes(word)).join(' ')
    console.log(result);
}

Upvotes: 1

Related Questions