cnick
cnick

Reputation: 332

How to find certain arrays inside an array of arrays

I have an array of arrays. Example:

[
 [a, b, c],
 [b, d],
 [a, c],
 [b],
 [a, b, c, d]
]

In practice, inner arrays will be relatively short, but the outer array will be a few thousand rows.

Now, I have a reference array. Example:

[a, b, c]

The challenge is to determine which rows of the first array are completely contained in the reference array. For this example, I would want rows 1, 3, and 4 returned to me after this check.

I can do this using a nested loop pretty easily, but I was wondering if there was a clever algorithm that would be more scalable.

Recap: I need the row numbers that are some subset of the reference array. Row elements are not necessarily unique. EDIT: There are about 200 unique elements that can be inside each array. The array of arrays will be constant, but I will need to perform this same search for many different reference arrays

Upvotes: 1

Views: 83

Answers (3)

msb
msb

Reputation: 4408

If you need to do this operation several times for the same array of arrays, and the inner arrays besides being short have a non-extensive set of possible elements, and they never have repeat elements (thanks @Kunukn), you can create an alternate structure turning your inner arrays into numbers and doing a "XOR" and "OR" between the element to be found and the array. That way, you'll loop your array only once. What's more, you may sort your array of arrays by value, which might be useful for other purposes. Example: a is 1, b is 2, c is 4, d is 8. Your sample array of arrays become:

[
 [a, b, c], // 1 + 2 + 4 == 7
 [b, d], // 2 + 8 == 10
 [a, c], // 1 + 4 == 5
 [b], // 2 == 2
 [a, b, c, d] // 1 + 2 + 4 + 8 == 15
]

Then, your reference array [a, b, c] is calculated into 7, the same way as the 1st element in the above array of arrays. Now, to search the element. Assuming a 4-bit integer to represent them, you'd have:

7 XOR 7 => 1110 XOR 1110 == 0000; 0000 OR 1110 == 1110 => equal to 7, found
10 XOR 7 => 0101 XOR 1110 == 1011; 1011 OR 1110 == 1111 => not equal to 7, not found
5 XOR 7 => 1010 XOR 1110 == 0100; 0100 OR 1110 == 1110 => equal to 7, found
2 XOR 7 => 0100 XOR 1110 == 1010; 1010 OR 1110 == 1110 => equal to 7, found
15 XOR 7 => 1111 XOR 1110 == 0001; 0001 OR 1110 == 1111 => not equal to 7, not found

EDIT: even though you won't have an integer that uses 200 bits, you can still use a 200-bit structure and do the same operations, it will just be a bit more of work for you.

Upvotes: 0

Kunukn
Kunukn

Reputation: 2236

You could use subset matching. I assume your dataset are sorted.

Build the subset possibilities from the reference array. Best if the reference array is small, else the dataset to be build and compared would be to big.

E.g. the sorted subset matching of the reference array [a, b, c] gives

[a], [b], [c], [a,b], [a,c], [b,c], [a,b,c]

From the dataset

[
 [a, b, c],
 [b, d],
 [a, c],
 [a, c],
 [b],
 [a, b, c, d]
]

Build a hash table of the data with their position in an array

 hash table
 "abc", [1]
 "bd", [2]
 "ac", [3,4]
 "b", [5]
 "abcd", [6]   

reference array hash

hashReference = {"a","b","c","ab", "ac," "bc", "abc"}

Algorithm

foreach key in hashRefence 
    lookup the hash table

This returns [5], [3,4], [1]

Upvotes: 0

user2566092
user2566092

Reputation: 4661

If you store your reference array as a structure where you can check for membership of an element in O(1) time (e.g., using a hash table or just a 0/1 membership indicator array since the universe is small), then you can store your collection of arrays as a trie where each array is sorted, and traverse the trie finding paths to leaves that are completely contained in your reference array. This can reduce complexity because you'll be traversing simultaneously for a subcollection of arrays using single comparisons higher up in the trie when the arrays in your subcollection overlap, and you'll be skipping subsets of arrays simultaneously when they contain a common element that is not in your reference array.

Upvotes: 2

Related Questions