Reputation:
I am trying to understand the running time of counting sort. In my notes, it says, assuming the size of the array A is n, and k is the number of times each number occurs,
Counting-Sort(A,k) {
for (i=1; i <= k; i++) // initialize number counters to 0
times[i] = 0;
for (j=1; j <= length[A]; j++) // decide how many times each
times[A[j]]++; // number appears in the input
// form the sorted array
m=1;
for ( i=1; i <= k; i++) // consider each number in the range
for ( j=1; j <= times[ i ]; j++) { // generate that number in
A[m]=i; // the output as many times as
m++; // it occurs in the input
}
}
Let ti denote the number of times the inner loop iterates for each i. When we look at the nested for loops at the bottom of the code, note that, every time the inner loop is iterated, we are placing a new number into its correct place in the output array. Hence: sum ti (from i=1 to k) = n.
I don't understand why this sum is equal to n. The outer loop iterates k times, and inner loop may iterate at most n times, so it must be O(nk). Can somebody explain? Thanks
Upvotes: 4
Views: 5373
Reputation: 5940
Algorithm
Counting Sort, known also as Histogram Sort
n = length of input Array
k = number of unique symbols that appear in the input Array
Initialization takes k
time
Counting takes n
time
Enumeration takes Sum { Count(i) } = n
time
Complexity
Time = k + n + n = 2n+k
Time ~ O(2n+k) = O(n+k)
Space = n + k ~ O(n+k)
Upvotes: 3
Reputation: 2093
As you can see for the loop
m=1;
for ( i=1; i <= k; i++) // consider each number in the range
for ( j=1; j <= times[ i ]; j++) { // generate that number in
A[m]=i; // the output as many times as
m++; // it occurs in the input
}
its complexity will be equal to how many times the body of inner cycle is executed. When i = 1
, it is executed times[1]
times, when i = 2
=> times[2]
times, and so, when i = k
, times[k]
times. So in total inner body is executed times[1] + times[2] + ... + times[k]
times, and that sum is equal to n
, as each element is counted once.
Upvotes: 0
Reputation: 18530
The notes aren't entirely correct. k
is the highest number that shows up in your array (i.e. it has numbers from 1 to k). So, let's step through the algorithm:
k
"bins": O(k)
O(n)
n
, because that's how many entries we had in total in the arrayO(k)
O(n)
The important thing here is that we know that even though we iterate over k
bins and there could, in general, be up to n
values represented by each bin, we set up the bins so that, across all the iterations of the outer loop, the inner loop still runs for a total of n
times. Hence, the total complexity of this step is actually O(n)
.
If we ignored our extra knowledge about the contents of the bins, you'd be absolutely right.
Now there's a final twist: if k > n
, it's actually in O(k)
rather than O(n)
! Because now the outer loop is the thing that "runs more often".
I hope this makes some sense.
Upvotes: 0