A data structure for counting integers within some range?

Question

Question:

Given n integers in the range [1, k], preprocesses its input and then answers any query about how many of the n integers have values between a and b, where 1 ≤ a, b ≤ k are two given parameters. Your algorithm should use O(n + k) preprocessing time.

templatetypedef · Accepted Answer

Your algorithm is reasonably good, but it can be made much faster. Specifically, your algorithm has O(1) preprocessing time, but then spends O(n) time per query because of the linear cost of the time required to do the partitioning step.

Let's consider an alternative approach. Suppose that all of your values were in sorted order. In this case, you could find the number of elements in a range very quickly by just doing two binary searches - a first binary search to find the index of the lower bound, and a second search to find the upper bound - and could just subtract the indices. This would take time O(log n). If you can preprocess the input array to sort it in time O(n + k), then this approach will result in exponentially faster lookup times.

To do this sorting, as @minitech has pointed out, you can use the counting sort algorithm, which sorts in time O(n + k) for integers between 1 and k. Consequently, using both counting sort and the binary search together gives O(n + k) setup time and O(log n) query time.

If you are willing to trade memory for efficiency, though, you can speed this up even further. Let's suppose that k is a reasonably small number (say, not more than 100). Then if you are okay using O(k) space, you can answer these queries in O(1) time. The idea is as follows: build up a table of k elements that represents, for each element k, how many elements of the original array are smaller than k. If you have this array, you can find the total number of elements in some subrange by looking up how many elements are less than b and how many elements are less than a (each in O(1) time), then subtracting them.

Of course, to do this, you have to actually build up this table in time O(n + k). This can be done as follows. First, create an array of k elements, then iterate across the original n-element array and for each element increment the spot in the table corresponding to this number. When you're done (in time O(n + k)), you will have filled in this table with the number of times that each of the values in the range 1 - k exists in the original array (this is, incidentally, how counting sort works). Next, create a second table of k elements that will hold the cumulative frequency. Then, iterate across the histogram you built in the first step, and fill in the cumulative frequency table with the cumulative total number of elements encountered so far as you walk across the histogram. This last step takes time O(k), for a grand total of time O(n + k) for setup. You can now answer queries in time O(1).

Hope this helps!

A data structure for counting integers within some range?

Answers (2)

Related Questions