Can this string searching algorithm be optimised?

Question

Recently I attempted a 'test' on a popular coding website which posed the question:

In a given string s find the substring of size k which contains the most vowels.

Example

s = "eirowpfmsnaquisaa" and k = 5 -

Pass 1 - substring = "eirow" - Vowels = 3
Pass 2 - substring = "irowp" - Vowels = 2
Pass 3 - substring = "rowpf" - Vowels = 1
Pass 4 - substring = "owpfm" - Vowels = 1
Pass 5 - substring = "wpfms" - Vowels = 0
Pass 6 - substring = "pfmsn" - Vowels = 0
Pass 7 - substring = "fmsna" - Vowels = 1
Pass 8 - substring = "msnaq" - Vowels = 1
Pass 9 - substring = "snaqu" - Vowels = 2
Pass 10 - substring = "naqui" - Vowels = 3
Pass 11 - substring = "aquis" - Vowels = 3
Pass 12 - substring = "quisa" - Vowels = 3
Pass 13 - substring = "uisaa" - Vowels = 4 (FINAL RESULT)

I created an initial trivial iterative function which simply iterated from 0 to the string length minus the length of the substring (i=0 to i)

const countVowels = str => str.length - str.replace((/[aeiouAEIOU]/gi), "").length let findSubstring = (s,k) => { let max = 0, dict = {} for(let i=0;i<=s.length-k;i++) { // iterate over string length - k dict[i] = countVowels(s.slice(i, k+i)) // add vowels in substring to dict if(dict[i] > dict[max]) max = i // if there are more vowels than the max encountered this is the new max } return max === 0 ? "Not found!" : s.slice(max, max+k); // return substring if found }

I found this method was successful for some test cases but it failed multiple on a 10 second timeout with larger inputs.

As a second attempt I decided removing the vowel counting out of the loop and using a map would be much quicker for the larger inputs. The first attempt would being quicker for smaller inputs due to the additional overhead in V2 to calculate the final result.

The second algorithm is as follows:

const countVowels = str => str.length - str.replace((/[aeiouAEIOU]/gi), "").length let findSubstring = (s,k) => { let dict = {}, maxVowels, finalIndex for(let i=0;i<=s.length-k;i++) dict[i] = s.slice(i, k+i) // make dict of all substrings Object.values(dict).map((el, i) => dict[i] = countVowels(el)) // convert substring to vowel count maxVowels = Object.values(dict).sort((a, b) => a - b).pop() // sort to find max vowels finalIndex = parseInt(Object.keys(dict).find(key => dict[key] === maxVowels)) // search for first instance of string with max vowels return !finalIndex ? "Not found!" : s.slice(finalIndex, finalIndex+k) // return substring if found }

Interesting Note: The countVowels function uses .replace() over .match() as it affected the run time significantly with larger values, with smaller values being marginally better with .match()

String of length 11628 with K=250. The results are an average over 5 runs .replace() - V1 = 66ms, V2 = 52ms .match() - V1 = 125ms, V2 = 92ms

As a comparison on smaller values such as the length of s less than 500 and k less than 100 V1 will outperform V2 9/10 times. However, once s has a length over 500 and k over 100 you can see V2 outperforming V1 with the performance margin growing as s and k get larger (as you would expect)

However, I found that when using V2 I was still having my tests failing due to the timeout. I can't see anything that can be changed in V2 which would optimise it significantly so I am curious to hear if anyone has any ideas that would make it quicker for large inputs.

Christian Sloper · Accepted Answer

Something like this would be much faster, it is just a single linear pass with O(1) lookups:

def count_vowels(s, k):
    vowels = set("aeiouAEIOU")

    #count first k
    t = sum(1 for i in s[:k] if i in vowels)
    mx = t
    for i in range(k, len(s)):
        if s[i] in vowels:
            t+=1
        if s[i-k] in vowels:
            t-=1
        if t > mx:
            mx = t

    return mx

count_vowels("eirowpfmsnaquisaaaa",5)

And here is how this can be implemented in JavaScript:

function count_vowels(s, k) {
  const vowels = new Set(Array.from("aeiouAEIOU"));

  let t = 0;
  //count first k
  for (let i = 0; i < k && i < s.length; i++) {
    t += vowels.has(s[i]) ? 1 : 0;
  }

  let mx = t;

  for (let i = k; i < s.length; i++) {
    if (vowels.has(s[i])) {
      t += 1
    }
    if (vowels.has(s[i - k])) {
      t -= 1;
    }
    if (t > mx) {
      mx = t;
    }
  }

  return mx;
}

console.log(count_vowels("eirowpfmsnaquisaaaa", 5))

Can this string searching algorithm be optimised?

Answers (1)

Related Questions