Reputation: 5437
This is not a real-life question, it is just theory-crafting.
I have a big array which consists of elements like [1,140,245,123443]
, all
integer or floats with low selectivity, and the number of unique values is ten
times less than the size of the array. B*tree indexing is not good in this case.
I also tried to implement bitmap indexing, but in Ruby, binary operations are not so fast.
Are there any good algorithms for searching two-dimensional arrays of fixed size vectors?
And, the main question is, how do I convert the vector in value, where the conversion function has to be monotonic, so I can apply range queries such as:
(v[0]<10, v[2]>100, v[3]=32, 0.67*10^-8<v[4]<1.2154241410*10^-6)
the only idea i have is to create separate sorted indexes for each component of vector...binary search then and merge...but it is a bad idea because in the worst case scenario it will require O(N*N) operations...
Upvotes: 1
Views: 589
Reputation: 66922
Assuming that each "column" is vaguely evenly distributed in a known range, you could keep track of a series of buckets for each column, and a list of rows that satisfy the bucket. The number of buckets for each column can be the same, or different, it's totally arbitrary. More buckets is faster, but takes slightly more memory.
my table:
range: {1to10} {1to4m} {-2mto2m}
row1: {7 3427438335 420645075}
row2: {5 3862506151 -1555396554}
row3: {1 2793453667 -1743457796}
buckets for column 1:
bucket{1-3} : row3
bucket{4-6} : row2
bucket{7-10} : row1
buckets for column 2:
bucket{1-2m} :
bucket{2m-4m} : row1, row2, row4
buckets for column 3:
bucket{-2m--1m} : row2, row3
bucket{-1m-0} :
bucket{0-1m} :
bucket{1m-2m} : row1
Then, given a series of criteria: {v[0]<=5, v[2]>3*10^10}
, we pull out the buckets that match that criteria:
column 1:
v[0]<=5 matches buckets {1-3} and {4-6}, which is rows 2 and 3.
column 2:
v[2]>3*10^10} matches buckets {2m-4m} and {4-6}, which is rows 1, 2 and 3.
column 3:
"" matches all , which is rows 1, 2 and 3.
Now we know that the row(s) we're looking for meet all three criteria, so we list all the rows that are in the buckets that matched all the criteria, in this case, rows 2 and 3. At this point, the number of rows remaining will be small even for massive amounts of data, depending on the granularity of your buckets. You simply check each of the rows that is left at this point to see if they match. In this sample we see that row 2 matches, but row 3 doesn't.
This algorithm is technically O(n), but in practice, if you have large numbers of small buckets, this algorithm can be very fast.
Upvotes: 2
Reputation: 6814
Using an index :)
The basic idea is to turn the 2 dimensional array into a 1 dimensional sorted array(while keeping the original position) and apply binary search on the later.
This method works for any n dimensional array and is used widely by databases which can be seen as a n dimensional array with variable lengths.
Upvotes: 0