AndreSites
AndreSites

Reputation: 113

fastest way to compare a string with a array of strings

I have a array, let's say:

   var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
   var myString = "I went to the park with my garmin watch";

What is the fast way to check if my String has any of the words in myArray?

Bellow is my code, but im not sure if it would be the best way to go...

   function score(arKeywords, frase) {
      if (frase == undefined) {
        return 0;
      } else {
          var indice = 0;
          var indArray = arKeywords.length;
          var sentencaMin = frase.toLowerCase();
          for (i = 0; i < indArray; i++) {
              if (sentencaMin.search(arKeywords[i]) > 0) { indice++; }
          }
          return indice;
      }
  }

Please help me anyone. That function will run in A LOT of strings!

Thank you all :)

Upvotes: 5

Views: 1272

Answers (7)

1983
1983

Reputation: 5963

For speed, try a precompiled RegExp:

var re = RegExp('\\b' + myArray.join('\\b|\\b') + '\\b', gi);
var i, matches;
for(i=0; i<lotsOfStrings.length; i+=1){
    // note that this retrieves the total number
    // of matches, not unique matches, which may
    // not be what you want
    matches = lotsOfStrings[i].match(re);
    // do something with matches
}

Note that the RegExp is constructed outside the loop.

Alternatively, to simply test for a match:

var re = RegExp('\\b' + myArray.join('\\b|\\b') + '\\b', gi);
var i, matched;
for(i=0; i<lotsOfStrings.length; i+=1){
    matched = re.test(lotsOfStrings[i]);
    // do something with matched
}

Upvotes: 1

le_m
le_m

Reputation: 20228

What is the fast way to check if my String has any of the words in myArray?

Compile your myArray into regex and test for myString - please see FizzyTea's answer.

If you don't want to use regex for whatever reason, the second fastest alternative is to use String.includes() and Array.some():

 var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
 var myString = "I went to the park with my garmin watch";

 console.log(myArray.some(e => myString.includes(e)));

For a performance comparison of different methods, see https://jsfiddle.net/usq9zs61/5/

Results over 100000 iterations in Chrome 48 / Firefox 46, Ubuntu:

  • compiledregextest (FizzyTea): 16.046ms / 21.84ms
  • someincludes (this answer): 76.707ms / 62.55ms
  • compiledregexmatch (FizzyTea): 104.682ms / 170.58ms
  • someset (Comment by Bergi): 488.474ms / 749.46ms
  • splitregexsome (David Thomas): 529.529ms / 677.20ms
  • filterset (Comment by Bergi): 742.857ms / 875.86ms
  • ahocorasick (ordi): 1790.654ms / 1642.19ms

The Aho-Corasick algorithm proposed by orid has the best run-time complexity, but the alternative methods execute faster on current Javascript engines unless your myArray of search strings is much bigger.

Upvotes: 4

David Thomas
David Thomas

Reputation: 253308

Based on this sentence, from the question:

What is [a] way to check if my String has any of the words in myArray?

(Emphasis mine.)

I'd suggest the following, which will test if "some" of the words in the supplied string are present in the supplied array. This – theoretically – stops comparing once there is a match of any of the words from the string present in the array:

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"],
  myString = "I went to the park with my garmin watch";

function anyInArray(needles, haystack) {

  // we split the supplied string ("needles") into words by splitting
  // the string at the occurrence of a word-boundary ('\b') followed
  // one or more ('+') occurrences of white-space ('\s') followed by
  // another word-boundary:
  return needles.split(/\b\s+\b/)
    // we then use Array.prototype.some() to work on the array of
    // words, to assess whether any/some of the words ('needle') 
    // - using an Arrow function - are present in the supplied
    // array ('haystack'), in which case Array.prototype.indexOf()
    // would return the index of the found-word, or -1 if that word
    // is not found:
    .some(needle => haystack.indexOf(needle) > -1);
    // at which point we return the Boolean, true if some of the
    // words were found, false if none of the words were found.
}

console.log(anyInArray(myString, myArray));

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"],
  myString = "I went to the park with my garmin watch";

function anyInArray(needles, haystack) {
  return needles.split(/\b\s+\b/).some(needle => haystack.indexOf(needle) > -1);
}

console.log(anyInArray(myString, myArray));

JS Fiddle demo.

References:

Upvotes: 4

orip
orip

Reputation: 75427

The most efficient solution for this problem would probably be the Aho-Corasick algorithm, which searches in O(size of string being searched) after having created the initial DAG from the list of strings in O(sum of sizes of strings in the list).

Upvotes: 0

omarjmh
omarjmh

Reputation: 13888

Here is one way to do it: https://jsbin.com/fiqegu/1/edit?js,console

var result = myString.split(' ').filter(function(word) {
  return myArray.indexOf(word) > -1;
});

This will return the words

Obviously, you can get the count by adding .length to the end of the above code:

var result = myString.split(' ').filter(function(word) {
  return myArray.indexOf(word) > -1;
}).length;

Upvotes: 0

Andreas Louv
Andreas Louv

Reputation: 47099

You could join the array with | and construct a regex, properly not the fastest, but quote pretty:

function score(myArray, text) {
  var regex = new RegExp('\\b(' + myArray.join('|') + ')\\b', 'gi');
  var matches = text.match(regex);
  return matches ? matches.length : 0;
}

And usage:

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
var myString = "I went to the park with my garmin watch";

score(myArray, myString); // 2
score(myArray, 'Ibira is doing sports in the Park'); // 3

This assumes myArray doesn't contain any special characters.

Upvotes: 0

Barmar
Barmar

Reputation: 780798

If you just want to know if there are any matches, you can convert the array into a regular expression.

My regexp also uses \b to match word boundaries, so park won't match if the string contains spark.

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
var myString = "I went to the park with my garmin watch";


function score(arKeywords, frase) {
  if (frase == undefined) {
    return 0;
  } else {
    var re = new RegExp('\\b(' + arKeywords.join('|') + ')\\b', 'i');
    return !!frase.match(re);
  }
}

console.log(score(myArray, myString));

Upvotes: 0

Related Questions