Sandeep Thomas
Sandeep Thomas

Reputation: 4727

Replace all occurrence of specific words in a sentence based on an array of words

Ive an array like this

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"];

So the Im trying to make a function or any quick way to remove the occurances of the above words from a sentence. Not using any looping how can I quickly achieve that.

They way Im doing it now

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"];
var sentence = "The first solution does not work for any UTF-8 alphaben. (It will cut text such as Привіт). I have managed to create function which do not use RegExp and use good UTF-8 support in JavaScript engine. The idea is simple if symbol is equal in uppercase and lowercase it is special character. The only exception is made for whitespace.";

$(excludeWords).each(function(index, item) {
  var s = new RegExp(item, "gi");
  sentence = sentence.replace(s, "");
});
alert(sentence);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

But is there is any better solution than looping??

Based on a comment little more details..

It never should remove part of a word. it should only replace a full word

Upvotes: 1

Views: 678

Answers (6)

Himanshu Teotia
Himanshu Teotia

Reputation: 2185

will be better if we split on the basis of word boundaries.

sentence = sentence.split(/\b/).reduce((str, word) => {
  return new Set(excludeWords).has(word)
    ? str + word.replace(/./g, '')
    : str + word;
}, '').replace(/\s\s+/,' ').trim();

Upvotes: 1

Nina Scholz
Nina Scholz

Reputation: 386610

You could add a word boundary \b for getting only words to replace.

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"],
    sentence = "The first solution does not work for any UTF-8 alphaben. (It will cut text such as Привіт). I have managed to create function which do not use RegExp and use good UTF-8 support in JavaScript engine. The idea is simple if symbol is equal in uppercase and lowercase it is special character. The only exception is made for whitespace.";

sentence  = excludeWords.reduce(function(r, s) {
    return r.replace(new RegExp('\\b' + s + '\\b', "gi"), "");
}, sentence);

console.log(sentence);

Upvotes: 1

adeneo
adeneo

Reputation: 318212

You'd split on space, and just check if the word is in the array in a filter

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"];

var sentence = "The first solution does not work for any UTF-8 alphaben. (It will cut text such as Привіт). I have managed to create function which do not use RegExp and use good UTF-8 support in JavaScript engine. The idea is simple if symbol is equal in uppercase and lowercase it is special character. The only exception is made for whitespace.";

var res = sentence.split(" ").filter(w=>!excludeWords.includes(w.toUpperCase())).join(" ");

console.log(res)

If you simply replace strings with a regex, you'll have some issues, for instance solution ends up being luti as both so and on are in the array, so you need to compare complete words instead

Upvotes: 1

Himanshu Upadhyay
Himanshu Upadhyay

Reputation: 6565

You can make one string of the array values and then apply regex on it and again convert it to array.

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"];

var array_to_string = excludeWords.join(' ');
var s = new RegExp(array_to_string, "gi");
var sentence = sentence.replace(s, "");
var excludewords_updated = sentence.split(' ');

so this is how you can do it without looping.

Upvotes: 1

Masmiix
Masmiix

Reputation: 19

You can use preg_replace_all("~[\"](.*)[\"]~isuU", $data, $found)

Upvotes: -1

georg
georg

Reputation: 214969

You're almost there. The trick is to combine all words into one big regexp to do the replacement just once. \\b's ensure that you actually replacing whole words and not just substrings.

var excludeWords = ["A", "ABOUT", "ABOVE", "ACROSS", "ALL", "ALONG", "AM", "AN", "AND", "ANY", "ASK", "AT", "AWAY", "CAN", "DID", "DIDN'T", "DO", "DON'T", "FOR", "FROM", "HAD", "HAS", "HER", "HIS", "IN", "INTO", "IS", "IT", "NONE", "NOT", "OF", "ON", "One", "OUT", "SO", "SOME", "THAT", "THE", "THEIR", "THERE", "THEY", "THESE", "THIS", "TO", "TWIT", "WAS", "WERE", "WEREN'T", "WHICH", "WILL", "WITH", "WHAT", "WHEN", "WHY"];

var sentence = "The first solution does not work for any UTF-8 alphaben. (It will cut text such as Привіт). I have managed to create function which do not use RegExp and use good UTF-8 support in JavaScript engine. The idea is simple if symbol is equal in uppercase and lowercase it is special character. The only exception is made for whitespace.";

var re = new RegExp(`\\b(${excludeWords.join('|')})\\b`, 'gi');
sentence = sentence.replace(re, "");
console.log(sentence);

Note that this eventually creates consecutive spaces in the string. These can be easily removed with replace(/\s+/g, ' ').trim().

Upvotes: 4

Related Questions