user479947
user479947

Reputation:

What's the most efficient algorithm to find the indexes of multiple characters inside a string (javascript)?

I'm looking for the fastest method I can use to search a body of text for the indexes of multiple characters.

For example:

searchString = 'abcdefabcdef'; 
searchChars  = ['a','b'];
// returns {'a':[0,6], 'b':[1,7]}

Upvotes: 4

Views: 656

Answers (3)

user479947
user479947

Reputation:

After timing a few single pass algorithms and Guffa's regex, I ended up going with this:

function findIndexesMultiPass(str,find) {
  var x, output = {};

  for (var i = 0; i < find.length; i++) {
    output[find[i]] = [];
    x = 0;
    while ((x = str.indexOf(find[i], x)) > -1) {
      output[find[i]].push(x++);
    }
  }

  return output;
}

var searchString = "abcd abcd abcd";
var searchChars  = ['a', 'b'];
var result = findIndexesMultiPass(searchString, searchChars);
// {'a':[0,5,10], 'b':[1,6,11]}

This turned out to be pretty slow:

function findIndexesOnePass(str,find) {
  var output = {};

  for (var i = 0; i < find.length; i++) {
    output[find[i]] = [];
  }

  for (var i = 0; i < str.length; i++) {
    var currentChar = str.charAt(i);
    if (output[currentChar] !== undefined) {
      output[currentChar].push(i);
    }
  }

  return output;
}

var searchString = "abcd abcd abcd";
var searchChars  = ['a', 'b'];
var result = findIndexesOnePass(searchString, searchChars);
// {'a':[0,5,10], 'b':[1,6,11]}

Rough times (indexes of 3 characters)

Google Chrome (Mac)
findIndexesMultiPass: 44ms
findIndexesOnePass: 799ms
findIndexesRegEx: 95ms

Safari
findIndexesMultiPass: 48ms
findIndexesOnePass: 325ms
findIndexesRegEx: 293ms

Firefox
findIndexesMultiPass: 56ms
findIndexesOnePass: 369ms
findIndexesRegEx: 786ms

Upvotes: 0

Guffa
Guffa

Reputation: 700212

You should be able to use a regular expression to find all occurances of each character. Something like:

function findIndexes(find, str) {
  var output = {};
  for (var i = 0; i < find.length; i++) {
    var m = [];
    var r = new RegExp('.*?' + find[i], 'g');
    var ofs = -1;
    while ((x = r.exec(str)) != null) {
      ofs += x[0].length;
      m.push(ofs);
    }
    output[find[i]] = m;
  }
  return output;
}

Edit:

Did some changes, and now it works. :) However, as Javascript doesn't have a matches method to get all matches at once, it's not really any improvment over using indexOf... :P

Edit 2:

However, you can use a regular expression to find any of the characters, so you only need to loop the string once instead of once for each character. :)

function findIndexes(find, str) {
  var output = {};
  for (var i = 0; i < find.length; i++) output[find[i]] = [];
  var r = new RegExp('.*?[' + find.join('') + ']', 'g');
  var ofs = -1;
  while ((x = r.exec(str)) != null) {
    ofs += x[0].length;
    output[x[0].substr(x[0].length-1,1)].push(ofs);
  }
  return output;
}

Upvotes: 1

Will Hartung
Will Hartung

Reputation: 118603

Assuming few letters to search for and many letters to search against (i.e. low number of letter, long strings), the latter is the most efficient, since you only go through the string once, and then test each letter.

The other one goes through the string as many times as there are letters to search for.

Upvotes: 0

Related Questions