Reputation: 59
So, I have a 2D array, int a[X][Y];
X
can go up to 10 000 000 and Y
is maximum 6.
Given an array int v[Z]
(Z <= Y), I have to see if I find a line in a that contains all the elements from v
.
What would be the fastest algorithm for this matter and how would you implement this?
I have already tried the classic method of taking line by line and then with the 2 fors search, one for v elements and one for a elements but it takes too long.
What would be the best (fastest) approach ?
int check()
{
int nrfound;
for (int l = 0; l < lines_counter; l++) for each line in a array
{
nrfound = 0;
for (int i = 0; i < n; i++) { // for each element in v array
for (int j = 0; j < m; j++) // for each element in a[l] line
if (v[i] == a[l][j])
nrfound++;
if (nrfound == Z)
return 0;
}
}
return 1;
}
Upvotes: 1
Views: 3672
Reputation: 145317
Your algorithm has a flaw if there are duplicate elements in the a[i][]
subarrays. A matching element of v
will be counted multiple times and the count may equal Z
by coincidence.
Here is a corrected version:
int check(int X, int Y, int Z, int a[X][Y], int v[Z]) {
for (int x = 0; x < X; x++) {
// for each line in array a
int mask = 0;
for (int z = 0; z < Z; z++) {
// for each element in array v
for (int y = 0, m = 1; y < Y; y++, m <<= 1) {
// for each element in line a[x]
if (v[z] == a[x][y] && !(mask & m)) {
mask |= m;
break;
}
}
if (y == Y)
break;
}
if (z == Z)
return 0; // found a match
}
}
return 1; // no match
}
Unfortunately, the above code might be even slower than the posted one, but it is worth testing as the inner loop is exited as soon as a element from v
is not found in a[x]
.
Upvotes: 0
Reputation: 167
As you're working in C, it limits available data structures: I would suggest :
Depending on type of 2D input array : You can save some time with boundary conditions as you want all elements of query array maintain the order. You can also make use of (Z <= Y) Length of each line as to match if should first match the length.
Sorting the array will add complexity to it. So better to avoid it.
Upvotes: 0
Reputation: 407
I see three things to consider:
int a[X][Y]
table I would create additional array int[6][Y]
which will contain:
Upvotes: 2
Reputation: 80325
For the case of reusing the same array a[] with multiple different v[]:
Sort every line of a[][] as preliminary step (executed once)
Sort v[]
Use single loop (instead of two) to get intersection of ordered v[] and every ordered line of a[] - with approach like merge
procedure of merge sort
index_v = 0
index_a = 0
while index_v < length_v and index_a < length_a:
if v[index_v] == a[index_a]
index_v++, index_a++
else if v[index_v] < a[index_a]
index_v++
else
index_a++
if index_v == length_v:
return OK, a[] line contains all v elements
Upvotes: 1
Reputation: 20087
Sorting 1e7 arrays of size 6 can be easily parallelized using fixed sorting network with or without Simd/multithreading.
Sort v
and compare that with same principle as merge sorting two sorted lists.
The overall worst case complexity is between 13e7..24e7 comparisons (sorting network for 6 elements requires 12 conditional swaps and merging v/a[n] requires 1..12 comparisons.
Upvotes: 0