Reputation: 647
I have the following problem I need to optimize. For a given array(with duplicated keys allowed), for each position i
in the array, I need to compute all bigger values right of i
, and all smaller values left of i
. If we have:
1 1 4 3 5 6 7
and i = 3
(value 3), the count of smaller values to left of i
is 1
(no repeated keys), and to the right, the number of bigger values is 3
.
The brute force solution of this problem is ~N^2
, and with some extra space I can manage to compute the smaller values from the bigger ones, so reducing complexity to ~(N^2)/2
.
My question is: is there a faster way to get it done? Maybe NlgN
? I imagine there is a data structure out there I don't know which will allow me to do the computation faster.
EDIT: Thank you all for your replies and discussions. You can find two good solutions two the problem below. Always a pleasure learning from developers in stackoverflow.
Upvotes: 3
Views: 1621
Reputation: 9107
Here's an O(n log n)
solution.
As hinted by @SayonjiNakate, the solution using segment tree (I used Fenwick tree in my implementation) runs in O(n log M)
time, where M
is the maximum possible value in the array.
Firstly, note that the problem "number of smaller elements on the left" is equivalent to the problem "number of greater elements on the right" by reversing and negating the array. So, in my explanation below I only describe the "number of smaller elements on the left", which I call "lesser_left_count".
Algorithm for lesser_left_count:
The idea is to be able to find the total of numbers smaller than a specific number.
Define an array tree
with size upto MAX_VALUE
, which will store the value 1
for seen numbers and 0
otherwise.
Then as we traverse the array, when we see a number num
, just assign the value 1
to tree[num]
(update operation). Then lesser_left_count for a number num
is the sum from 1
to num-1
(sum operation) so far, since all smaller numbers to the left of current position would have been set to 1
.
Simple right? If we use Fenwick tree, the update and sum operation can be done each in O(log M)
time, where M
is the maximum possible value in the array. Since we are iterating over the array, total time is O(n log M)
.
The only disadvantage of the naive solution is that it uses a lot of memory as M
gets bigger (I set M=2^20-1
in my code, which take around 4MB of memory). This can be improved by mapping distinct integers in the array into smaller integers (in a way that preserve the order). The mapping can be done in simply O(n log n)
by sorting the array. So the number M
can be reinterpreted as "number of distinct elements in the array".
So the memory wouldn't be any problem anymore, because if after this improvement you indeed need huge memory, that means there are that many distinct numbers in your array, and the time complexity of O(n)
will already be too high to be calculated in normal machine anyway.
For the sake of simplicity, I didn't include that improvement in my code.
Oh, and since Fenwick tree only works for positive numbers, I converted the numbers in the array to be minimum 1. Note that this doesn't change the result.
Python code:
MAX_VALUE = 2**20-1
f_arr = [0]*MAX_VALUE
def reset():
global f_arr, MAX_VALUE
f_arr[:] = [0]*MAX_VALUE
def update(idx,val):
global f_arr
while idx<MAX_VALUE:
f_arr[idx]+=val
idx += (idx & -idx)
def cnt_sum(idx):
global f_arr
result = 0
while idx > 0:
result += f_arr[idx]
idx -= (idx & -idx)
return result
def count_left_less(arr):
reset()
result = [0]*len(arr)
for idx,num in enumerate(arr):
cnt_prev = cnt_sum(num-1)
if cnt_sum(num) == cnt_prev: # If we haven't seen num before
update(num,1)
result[idx] = cnt_prev
return result
def count_left_right(arr):
arr = [x for x in arr]
min_num = min(arr)
if min_num<=0: # Got nonpositive numbers!
arr = [min_num+1+x for x in arr] # Convert to minimum 1
left = count_left_less(arr)
arr.reverse() # Reverse for greater_right_count
max_num = max(arr)
arr = [max_num+1-x for x in arr] # Negate the entries, keep minimum 1
right = count_left_less(arr)
right.reverse() # Reverse the result, to align with original array
return (left, right)
def main():
arr = [1,1,3,2,4,5,6]
(left, right) = count_left_right(arr)
print 'Array: ' + str(arr)
print 'Lesser left count: ' + str(left)
print 'Greater right cnt: ' + str(right)
if __name__=='__main__':
main()
will produce:
Original array: [1, 1, 3, 2, 4, 5, 6] Lesser left count: [0, 0, 1, 1, 3, 4, 5] Greater right cnt: [5, 5, 3, 3, 2, 1, 0]
or if you want Java code:
import java.util.Arrays;
class Main{
static int MAX_VALUE = 1048575;
static int[] fArr = new int[MAX_VALUE];
public static void main(String[] args){
int[] arr = new int[]{1,1,3,2,4,5,6};
System.out.println("Original array: "+toString(arr));
int[][] leftRight = lesserLeftRight(arr);
System.out.println("Lesser left count: "+toString(leftRight[0]));
System.out.println("Greater right cnt: "+toString(leftRight[1]));
}
public static String toString(int[] arr){
String result = "[";
for(int num: arr){
if(result.length()!=1){
result+=", ";
}
result+=num;
}
result+="]";
return result;
}
public static void reset(){
Arrays.fill(fArr,0);
}
public static void update(int idx, int val){
while(idx < MAX_VALUE){
fArr[idx]+=val;
idx += (idx & -idx);
}
}
public static int cntSum(int idx){
int result = 0;
while(idx > 0){
result += fArr[idx];
idx -= (idx & -idx);
}
return result;
}
public static int[] lesserLeftCount(int[] arr){
reset();
int[] result = new int[arr.length];
for(int i=0; i<arr.length; i++){
result[i] = cntSum(arr[i]-1);
if(cntSum(arr[i])==result[i]) update(arr[i],1);
}
return result;
}
public static int[][] lesserLeftRight(int[] arr){
int[] left = new int[arr.length];
int min = Integer.MAX_VALUE;
for(int i=0; i<arr.length; i++){
left[i] = arr[i];
if(min>arr[i]) min=arr[i];
}
for(int i=0; i<arr.length; i++) left[i]+=min+1;
left = lesserLeftCount(left);
int[] right = new int[arr.length];
int max = Integer.MIN_VALUE;
for(int i=0; i<arr.length; i++){
right[i] = arr[arr.length-1-i];
if(max<right[i]) max=right[i];
}
for(int i=0; i<arr.length; i++) right[i] = max+1-right[i];
right = lesserLeftCount(right);
int[] rightFinal = new int[right.length];
for(int i=0; i<right.length; i++) rightFinal[i] = right[right.length-1-i];
return new int[][]{left, rightFinal};
}
}
which will produce same result.
Upvotes: 3
Reputation: 19020
Here is an algorithm which should give you O(NlgN)
:
Iterate over the list once and build a map of key => indexList
. So for ever key (element in the array) you store a list of all the indices where that key is in the array. This will take O(N)
(iterate over the list) + N*O(1)
(appending N items to lists) steps. So this step is O(N)
. The second step requires that these lists are sorted which they will be as we are iterating over the list from left to right so a newly inserted index in a list will always be larger than all the other ones which are already in there.
Iterate over the list again and for each element search the index lists for all keys which are larger than the current element for the first index which is after the current index. This gives you the number of elements to the right of the current one which are larger than the current element. As the index lists are sorted you can do a binary search which will take O(k * lgN)
steps with k
being the number of keys larger then the current one. If the number of keys has an upper limit then this is a constant as far as big-O is concerned. The second step here is to search all smaller keys and find the first index in the list which is prior to the current one. This will give you the number of element to the left of the current one which are smaller. Same reasoning as above this is O(k * lgN)
So assuming the number of keys is limited this should give you O(N) + N * 2 * O(lgN)
so overall O(NlgN)
if I'm not mistaken.
Edit: Pseudo code:
int[] list;
map<int => int[]> valueIndexMap;
foreach (int i = 0; i < list.length; ++i) { // N iterations
int currentElement = list[i]; // O(1)
int[] indexList = valueIndexMap[currentElement]; // O(1)
indexList.Append(i); // O(1)
}
foreach (int i = 0; i < list.length; ++i) { // N iterations
int currentElement = list[i]; // O(1)
int numElementsLargerToTheRight;
int numElementsSmallerToTheLeft;
foreach (int k = currentElement + 1; k < maxKeys; ++k) { // k iterations with k being const
int[] indexList = valueIndexMap[k]; // O(1)
int firstIndexBiggerThanCurrent = indexList.BinaryFindFirstEntryLargerThan(i); // O(lgN)
numElementsLargerToTheRight += indexList.Length - firstIndexBiggerThanCurrent; // O(1)
}
foreach (int k = currentElement - 1; k >= 0; --k) { // k iterations with k being const
int[] indexList = valueIndexMap[k]; // O(1)
int lastIndexSmallerThanCurrent = indexList.BinaryFindLastEntrySmallerThan(i); // O(lgN)
numElementsSmallerToTheLeft += lastIndexSmallerThanCurrent; // O(1)
}
}
Update: I tinkered around with a C# implementation in case anyone is interested;
Upvotes: 0
Reputation: 1627
Here's a relatively simple solution that's O(N lg(N))
that doesn't rely on the entries being among finitely many integers (in particular, it should work for any ordered data type).
We assume the output is to be stored in two arrays; lowleft[i]
will at the end contain the number of distinct values x[j]
with j < i
and x[j] < x[i]
, and highright[i]
will at the end contain the number of distinct values x[j]
with j > i
and x[j] > x[i]
.
Create a balanced tree data structure that maintains in each node, the number of nodes in the subtree rooted at that node. This is fairly standard, but not a part of the Java standard library I think; it's probably easiest to do an AVL tree or so. The type of the values in the nodes should be the type of the values in your array.
Now first iterate forward through the array. We start with an empty balanced tree. For every value x[i]
we encounter, we enter it into the balanced tree (near the end there are O(N)
entries in this tree, so this step takes O(lg(N))
time). When searching for the position to enter x[i]
, we keep track of the number of values less than x[i]
by adding up the sizes of all left subtrees whenever we take the right subtree, and adding what will be the size of the left subtree of x[i]
. We enter this number into lowleft[i]
.
If the value x[i]
is already in the tree, we just carry on with the next iteration of this loop. If the value x[i]
is not in there, we enter it and rebalance the tree, taking care to update the subtree sizes correctly.
Each iteration of this loop takes O(lg(N))
steps, for a total of O(N lg(N))
. We now start with an empty tree and do the same thing iterating backward through the array, finding the position for every x[i]
in the tree, and every time recording the size of all subtrees to the right of the new node as highright[i]
. Total complexity therefore O(N lg(N))
.
Upvotes: 2
Reputation: 3541
Try segment tree data structure used for solving RMQ. It would give you exactly n log n.
And look through RMQ problem generally, your problem may be reduced to it.
Upvotes: 2