Reputation: 9
I am trying to solve the following problem:
Given an array A of n elements, we have to answer m queries of type i,j,X. For each query we have to output the numbers in range i,j that are greater than X.
E.g.:
If the array is:
3 4 1 7
and the query was 1 4 3 i.e. we have to output numbers greater than 3 in the range 1 to 4.
Output: 2
Since three numbers are greater than 3 (4, 7)
Constraints:
1 < n < 10^5
1 < A[i] < 10^9
My Approach:
I tried to approach it using segments tree of segments of sqrt(n). It gives the time complexity of O(sqrt(n)).
Is there any other method to solve it in a smaller complexity?
Upvotes: 0
Views: 635
Reputation: 3390
#include <stdio.h>
#include <stdlib.h>
/*
/*
This file is both C and Markdown.
The data structure you are looking for needs to answer "three-sided range queries". They are called three-sided because you can picture your array as representing n points in a two-dimensional set, where the x coordinate is the array index and the y coordinate is the value at that index; your query then amounts to "print all of the y coordinates of points (x,y) where i <= x <= j and y > X". This is three inequalities on the values printed.
One very simple data structure that can support three-sized range queries is a priority search tree (PST).
*/
typedef struct NODE {
int y_max;
struct NODE *left, *right;
} Node;
/*
For your usage, you can use a very simple priority search tree. The tree will have 2*n - 1 nodes. Leaf nodes are associated with a single location in the array. Non-leaf nodes are associated with a contiguous region of the array. The association is as follows:
The root is associated with the whole array: positions [0, n)
The left child of a node with range [a,b) is associated with positions [a, floor((a + b)/2))
The right child of a node with range [a,b) is associated with positions [floor((a+b)/2), b)
If a region is empty, no node is stored for it.
The association is implicit and not stored anywhere; it can be inferred from n and the shape of the tree.
Each node additionally stores the maximum value among all of the values stores in its associated region.
For instance, if your array is {60, 70, 80, 90, 100}, then the tree nodes and their associated regions and values are:
[0,5):100
/ \
[0,2):70 [2,5):100
/ \ / \
[0,1):60 [1,2):70 [2,3):80 [3,5):100
/ \
[3,4):90 [4,5):100
Constructing a PST takes linear time using recursion:
*/
int Max(int x, int y) { return x > y ? x : y; }
Node * Construct(int n, int ys[]) {
if (!n) return NULL;
Node *result = malloc(sizeof(Node));
if (1 == n) {
result->y_max = ys[n];
result->left = result->right = NULL;
} else {
// To find y_max, we first recurse:
result->left = Construct(n / 2, ys);
result->right = Construct(n - n / 2, ys + n / 2);
// The the y_max is the max of the child y_max values:
result->y_max = Max(result->left->y_max, result->right->y_max);
}
return result;
}
/*
To query a PST, you need to find all leaves of the tree that are in
the given [i, j]
region with y_max > X
. This can also be done
recursively:
*/
void Query(int a, int b, Node *pst, int i, int j, int X) {
if (!pst || a > j || b < i || pst->y_max <= X) return;
if (b - a == 1) printf("%d ", pst->y_max);
Query(a, (a + b) / 2, pst->left, i, j, X);
Query((a + b) / 2, b, pst->right, i, j, X);
}
/*
A careful accounting shows that the time complexity is O(log n + k), where k is the number of nodes reported. Note that the lower bound on any data structure supporting these queries is Ω(k), since it takes that much time just to print the results.
You can find many careful accountings by just Googling "priority search tree".
The data structure above is sub-optimal in a number of ways, but it is simple to explain this way. It can be organized to be stored in an array of size n/2 + O(1), rather than Θ(n) tree nodes as above.
Updates can be performed in O(log n) time by traversing down the tree
recursively and rebuilding y_max
on the way back up:
*/
void Update(int i, int v, int a, int b, Node *pst) {
if (a + 1 == b) {
pst->y_max = v;
return;
}
// Recurse down one subtree:
int mid = (a + b) / 2;
if (i < mid) {
Update(i, v, a, mid, pst->left);
} else {
Update(i, v, mid, b, pst->right);
}
pst->y_max = Max(pst->left->y_max, pst->right->y_max);
}
Upvotes: 0
Reputation: 65498
The data structure that you're looking for is a 2D range tree. The following approach, with O(sqrt(n) log n) operation time, may be easier to implement however. (I'll leave the improvement to O(sqrt(n log n)) as an exercise.)
Divide the cows into sqrt(n) contiguous blocks of sqrt(n) cows. For each block, store the signs normally and additionally in sorted order. When an M query is processed, make the necessary change and resort (time O(sqrt(n) log n)). When a C query is processed, use brute force in the unsorted arrays for the partially overlapped blocks (time O(sqrt(n))) and binary search in the sorted arrays for the wholly contained blocks (time O(sqrt(n) log n)).
Here's the O(log^2 n) query time version. Keep a segment tree of sorted multisets, where each sorted multiset containing the signs for the cows in the segment. When an M query is processed, delete the cow's old sign from all of the multisets for segments containing that cow. Reinsert the cow's new sign in a similar fashion. When a C query is processed, partition the query interval into O(log n) segments and examine the number of elements in each sorted multiset that are in range. The best way to support this latter operation probably is a binary search tree where each node stores the number of nodes in its left child's subtree. The reason that I didn't suggest this first is that (i) it requires a lot more implementation effort (ii) for n = 100000, the difference in running time functions is sqrt(n)/log(n)**(3/2) ~ 8, which more than likely will be swallowed up by the relative cache-friendliness of the two approaches and the additional complexity of the latter.
Upvotes: 4