Reputation: 2274

How can I speed up my array search function?

I am working on dictionary application written with react-native.

When I want to filter the array from the search box, I wrote below function. This is working quite good when I test with 2000 word list. But when the word list goes to thousands the search speed is really slow.

So, how can I improve this search function?

//Filter array when input text (Search)

let filteredWords = []
if(this.state.searchField != null)
{
  filteredWords = this.state.glossaries.filter(glossary => {
    return glossary.word.toLowerCase().includes(this.state.searchField.toLowerCase());
  })
}

Upvotes: 3

Answers (2)

Wazner

Reputation: 3102

There are multiple factors that are making this code slow:

You're using filter() with a lambda. This adds a function call overhead for each item being searched.
You're calling toLowercase() on both strings before calling includes(). This will allocate two new string objects for every comparison.
You're calling includes. For some reason the includes() method is not as well optimized in some browsers as indexOf().

`for` loop (-11%)

Instead of using the filter() method, I recommend creating a new Array and using a for loop to fill it.

const glossaries = this.state.glossaries;
const searchField = this.state.searchField;
const filteredWords = [];   

for (let i = 0; i < glossaries.length; i++) {
  if (glossaries[i].toLowerCase().includes(searchField.toLowerCase())) {
    filteredWords.push(glossaries[i]);
  }
}

toLowerCase allocations (-45%)

Memory allocation is expensive due to the fact that JavaScript uses garbage collection mechanism for freeing used memory. When a garbage collection is performed the whole program is paused while it tries to finds memory which is not used anymore.

You can get rid of the toLowerCase() (inside the search loop) completely by making a copy of the glossary everytime the glossary is updated, which I assume is not often.

// When you build the glossary
this.state.glossaries = ...;
this.state.searchGlossaries = this.state.glossaries.map(g => g.toLowerCase());

You can also remove the toLowerCase() on the searchText by calling it once before the loop. After these changes, the code will look like:

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].includes(searchField)) {
    filteredWords.push(glossaries[i]);
  }
}

`indexOf()` instead of `includes()` (-13%)

I am not really sure why this is the case, but tests show that indexOf is a lot faster than includes.

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].indexOf(searchField) !== -1) {
    filteredWords.push(glossaries[i]);
  }
}

Overall the performance has improved by 70%. I got the performance percentages from https://jsperf.com/so-question-perf

Optimize the algorithm

In the comments you said you would like an example of optimizations that can be done when the requirements are loosened to only match words that start with the search text. One way to do this is a binary search.

Let's take the code from above as starting point. We sort the glossaries before we store it in the state. For sorting case insensitively, JavaScript exposes the Intl.Collator constructor. It provides the compare(x, y) method that returns:

negative value  | X is less than Y
zero            | X is equal to Y
positive value  | X is greater than Y

And the resulting code:

// Static in the file
const collator = new Intl.Collator(undefined, {
  sensitivity: 'base'
});

function binarySearch(glossaries, searchText) {
  let lo = 0;
  let hi = glossaries.length - 1;

  while (lo <= hi) {
    let mid = (lo + hi) / 2 | 0;
    let comparison = collator.compare(glossaries[mid].word, searchText);

    if (comparison < 0) {
      lo = mid + 1;
    }
    else if (comparison > 0) {
      hi = mid - 1;
    }
    else {
      return mid;
    }
  }

  return -1;
}

// When you build the glossary
this.state.glossaries = ...;
this.state.glossaries.sort(function(x, y) {
  return collator.compare(x.word, y.word);
});

// When you search
const glossaries = this.state.glossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];

const idx = binarySearch(glossaries, searchField);

if (idx != -1) {
  // Find the index of the first matching word, seeing as the binary search
  // will end up somewhere in the middle
  while (idx >= 0 && collator.compare(glossaries[idx].word, searchField) < 0) {
    idx--;
  }

  // Add each matching word to the filteredWords
  while (idx < glossaries.length && collator.compare(glossaries[idx].word, searchField) == 0) {
    filteredWords.push(glossaries[idx]);
  }
}

Upvotes: 9

ibrahim mahrir

Reputation: 31712

As the question doesn't seem to belong on CodeReview, I think there are a few things that you can do to make your code drastically faster [citation needed]:

Cache that call to this.state.searchField.toLowerCase() as you don't need to call it on every iteration.
Use regular old for loops instead of flashy-but-slow Array functions.

And here is the final result:

let filteredWords = []
if(this.state.searchField != null) {
    let searchField = this.state.searchField.toLowerCase(),
        theArray = this.state.glossaries;                          // cache this too

    for(let i = 0, l = theArray.length; i < l; ++i) {
        if(theArray[i].word.toLowerCase().includes(searchField)) {
            filteredWords.push(theArray[i]);
        }
    }
}

Edit:

If you want to search for glossaries whose word start with searchField, then use indexOf === 0 instead of includes as the condition like this:

if(theArray[i].word.toLowerCase().indexOf(searchField) === 0) {

Upvotes: 2

How can I speed up my array search function?

Answers (2)

for loop (-11%)

toLowerCase allocations (-45%)

indexOf() instead of includes() (-13%)

Optimize the algorithm

Related Questions

`for` loop (-11%)

`indexOf()` instead of `includes()` (-13%)