Yuri  Barbashov
Yuri Barbashov

Reputation: 5437

Is there a search algorithm for huge two-dimensional arrays?

This is not a real-life question, it is just theory-crafting.

I have a big array which consists of elements like [1,140,245,123443], all integer or floats with low selectivity, and the number of unique values is ten times less than the size of the array. B*tree indexing is not good in this case.

I also tried to implement bitmap indexing, but in Ruby, binary operations are not so fast.

Are there any good algorithms for searching two-dimensional arrays of fixed size vectors?

And, the main question is, how do I convert the vector in value, where the conversion function has to be monotonic, so I can apply range queries such as:

(v[0]<10, v[2]>100, v[3]=32, 0.67*10^-8<v[4]<1.2154241410*10^-6)

the only idea i have is to create separate sorted indexes for each component of vector...binary search then and merge...but it is a bad idea because in the worst case scenario it will require O(N*N) operations...

Upvotes: 1

Views: 589

Answers (2)

Mooing Duck
Mooing Duck

Reputation: 66922

Assuming that each "column" is vaguely evenly distributed in a known range, you could keep track of a series of buckets for each column, and a list of rows that satisfy the bucket. The number of buckets for each column can be the same, or different, it's totally arbitrary. More buckets is faster, but takes slightly more memory.

my table:
range:    {1to10} {1to4m}    {-2mto2m}
row1:     {7      3427438335 420645075}
row2:     {5      3862506151 -1555396554}
row3:     {1      2793453667 -1743457796}

buckets for column 1:
bucket{1-3} : row3
bucket{4-6} : row2
bucket{7-10} : row1

buckets for column 2:
bucket{1-2m} : 
bucket{2m-4m} : row1, row2, row4

buckets for column 3:
bucket{-2m--1m} : row2, row3
bucket{-1m-0} : 
bucket{0-1m} : 
bucket{1m-2m} : row1

Then, given a series of criteria: {v[0]<=5, v[2]>3*10^10}, we pull out the buckets that match that criteria:

 column 1:
v[0]<=5 matches buckets {1-3} and {4-6}, which is rows 2 and 3.
 column 2:
v[2]>3*10^10} matches buckets {2m-4m} and {4-6}, which is rows 1, 2 and 3.
 column 3:
"" matches all , which is rows 1, 2 and 3.

Now we know that the row(s) we're looking for meet all three criteria, so we list all the rows that are in the buckets that matched all the criteria, in this case, rows 2 and 3. At this point, the number of rows remaining will be small even for massive amounts of data, depending on the granularity of your buckets. You simply check each of the rows that is left at this point to see if they match. In this sample we see that row 2 matches, but row 3 doesn't.

This algorithm is technically O(n), but in practice, if you have large numbers of small buckets, this algorithm can be very fast.

Upvotes: 2

Samy Arous
Samy Arous

Reputation: 6814

Using an index :)

The basic idea is to turn the 2 dimensional array into a 1 dimensional sorted array(while keeping the original position) and apply binary search on the later.

This method works for any n dimensional array and is used widely by databases which can be seen as a n dimensional array with variable lengths.

Upvotes: 0

Related Questions