How to calculate runtime complexity of the nested loops with variable length

Question

Suppose I have a task to write an algorithm that runs through an array of strings and checks if each value in the array contains s character. The algorithm will have two nested loops, here is the pseudo code:

for (let i=0; i < a.length; i++)
  for (let j=0; j < a[i].length; j++)
      if (a[i][j] === 'c')
         do something

Now, the task is to identify the runtime complexity of the algorithm. Here is my reasoning:

let the number of elements in the array be n, while the maximum length of string values m. So the general formula for complexity is

n x m

Now the possible cases.

If the maximum length of string values is equal to the number of elements, I get the complexity:

n^2

If the maximum length of elements is less than the number of elements by some number a, the complexity is

n x (n - a) = n^2 - na

If the maximum length of elements is more than the number of elements by some number a, the complexity is

n x (n - a) = n^2 + na

Since we discard lower growth functions, it seems that the complexity of the algorithm is n^2. Is my reasoning correct?

Nir Friedman · Accepted Answer

Your time complexity is just the total number of characters. Which of the analyses is applicable, depends entirely on which of your assumptions about the relationship between the length of words, and the number of words, holds true. Note in particular, your statement that the time complexity is N x M where M is the largest name in the array, is not correct (it's correct in the sense that it places an upper bound, but that upper bound is not tight, so it's not very interesting; it's correct in the same sense that N^2 x M^2 is correct).

I think certainly in many real cases of interest, your analysis is incorrect. The total number of characters is equal to the number of strings, times the average number of characters per string, i.e. word length (note: average, not maximum!). As the number of strings becomes large, the average sample word length will approach the mean of whatever distribution you are sampling from. So at least for any well behaved distribution where the sampling is iid, the time complexity is simply N.

A good practical example is a database that stores names. It depends of course which people happen to be in the database, but if you are storing names of say American citizens, then as N becomes large, the number of inner operations will approach N times the average number of characters in a name, in the US. The latter quantity just doesn't depend on N at all, so it's linear in N.

How to calculate runtime complexity of the nested loops with variable length

Answers (1)

Related Questions