MrB
MrB

Reputation: 59

Fastest Search algorithm in 2D array

So, I have a 2D array, int a[X][Y]; X can go up to 10 000 000 and Y is maximum 6. Given an array int v[Z] (Z <= Y), I have to see if I find a line in a that contains all the elements from v.

What would be the fastest algorithm for this matter and how would you implement this?

I have already tried the classic method of taking line by line and then with the 2 fors search, one for v elements and one for a elements but it takes too long.

What would be the best (fastest) approach ?

int check()
{
    int nrfound;

    for (int l = 0; l < lines_counter; l++) for each line in a array
    {
        nrfound = 0;

        for (int i = 0; i < n; i++) { // for each element in v array

            for (int j = 0; j < m; j++) // for each element in a[l] line
                if (v[i] == a[l][j])
                    nrfound++;
            if (nrfound == Z)
                return 0;
        }
    }
    return 1;
}

Upvotes: 1

Views: 3672

Answers (5)

chqrlie
chqrlie

Reputation: 145317

Your algorithm has a flaw if there are duplicate elements in the a[i][] subarrays. A matching element of v will be counted multiple times and the count may equal Z by coincidence.

Here is a corrected version:

int check(int X, int Y, int Z, int a[X][Y], int v[Z]) {
    for (int x = 0; x < X; x++) {
        // for each line in array a
        int mask = 0;
        for (int z = 0; z < Z; z++) {
            // for each element in array v
            for (int y = 0, m = 1; y < Y; y++, m <<= 1) {
                // for each element in line a[x]
                if (v[z] == a[x][y] && !(mask & m)) {
                    mask |= m;
                    break;
                }
            }
            if (y == Y)
                break;
        }
        if (z == Z)
            return 0;   // found a match
        }
    }
    return 1;   // no match
}

Unfortunately, the above code might be even slower than the posted one, but it is worth testing as the inner loop is exited as soon as a element from v is not found in a[x].

Upvotes: 0

akshay dhule
akshay dhule

Reputation: 167

As you're working in C, it limits available data structures: I would suggest :

  • Initialize N threads, divide matrix rows X in N buckets, and run searching for each bucket in parallel.
  • Depending on type of 2D input array : You can save some time with boundary conditions as you want all elements of query array maintain the order. You can also make use of (Z <= Y) Length of each line as to match if should first match the length.

  • Sorting the array will add complexity to it. So better to avoid it.

Upvotes: 0

Thowk
Thowk

Reputation: 407

I see three things to consider:

  • Using threads.
  • If it's possible, when constructing int a[X][Y] table I would create additional array int[6][Y] which will contain:
    • List of indexes which contain 1, 2, 3 .. 6 elements. This allows you to narrow the search.
  • For each X count Hash of it's values. Then count Hash of V values.
    • Compare Hash code, instead of each separate value.

Upvotes: 2

MBo
MBo

Reputation: 80325

For the case of reusing the same array a[] with multiple different v[]:

Sort every line of a[][] as preliminary step (executed once)

Sort v[]

Use single loop (instead of two) to get intersection of ordered v[] and every ordered line of a[] - with approach like merge procedure of merge sort

index_v = 0
index_a = 0
while index_v < length_v and index_a < length_a:
   if v[index_v] == a[index_a]
       index_v++, index_a++
   else if v[index_v] < a[index_a]
       index_v++   
   else
     index_a++
if index_v == length_v:
   return OK, a[] line contains all v elements

Upvotes: 1

Aki Suihkonen
Aki Suihkonen

Reputation: 20087

Sorting 1e7 arrays of size 6 can be easily parallelized using fixed sorting network with or without Simd/multithreading.

Sort v and compare that with same principle as merge sorting two sorted lists.

The overall worst case complexity is between 13e7..24e7 comparisons (sorting network for 6 elements requires 12 conditional swaps and merging v/a[n] requires 1..12 comparisons.

Upvotes: 0

Related Questions