Alex Godofsky
Alex Godofsky

Reputation: 708

Data structure for set of (non-disjoint) sets

I'm looking for a data structure that roughly corresponds to (in Java terms) Map<Set<int>, double>. Essentially a set of sets of labeled marbles, where each set of marbles is associated with a scalar. I want it to be able to efficiently handle the following operations:

under the following conditions:

I've considered tries, but I don't see a good way to handle the "remove every set that contains a given integer" operation.

The purpose of this data structure would be to represent discrete random variables and permit addition, multiplication, and scalar multiplication operations on them. Each of these discrete random variables would ultimately have been created by applying these operations to a fixed (at compile-time) set of independent Bernoulli random variables (i.e. each takes the value 1 or 0 with some probability).

The systems being modeled are close to being representable as a time-inhomogeneous Markov chains (which would of course simplify this immensely) but, unfortunately, it is essential to track the duration since various transitions.

Upvotes: 8

Views: 1053

Answers (1)

aa333
aa333

Reputation: 2576

Here's a data structure, that can do all of your operations pretty efficiently:

I'm going to refer to it as a BitmapArray for this explanation.

Thinking about it, apparently for just the operations you have described a sorted array with bitmaps as keys and weights(your doubles) as values will be pretty efficient.

The bitmaps are what maintain membership in your set. Since you said the range of integers in the set are between 1-10,000, we can maintain information about any set with a bitmap of length 10,000.

It's gonna be tough sorting an array where the keys can be as big as 2^10000, but you can be smart about implementing the comparison function in the following way:

  • Iterate from left to right on the two bitmaps
  • XOR the bits on each index
  • Say you get a 1 at ith position
  • Whichever bitmap has 1 at ith position is greater
  • If you never get a 1 they're equal

I know this is still a slow comparison. But not too slow, Here's a benchmark fiddle I did on bitmaps with length 10000. This is in Javascript, if you're going to write in Java, it's going to perform even better.

    function runTest() {
    var num = document.getElementById("txtValue").value;
    num = isNaN(num * 1) ? 0 : num * 1;

    /*For integers in the range 1-10,000 the worst case for comparison are any equal integers which will cause the comparision to iterate over the whole BitArray*/
    bitmap1 = convertToBitmap(10000, num);
    bitmap2 = convertToBitmap(10000, num);

    before = new Date().getMilliseconds();
    var result = firstIsGreater(bitmap1, bitmap2, 10000);
    after = new Date().getMilliseconds();
    alert(result + " in time: " + (after-before) + " ms");

}


function convertToBitmap(size, number) {
    var bits = new Array();
    var q = number;
    do {
        bits.push(q % 2);
        q = Math.floor(q / 2);
    } while (q > 0);


    xbitArray = new Array();
    for (var i = 0; i < size; i++) {
        xbitArray.push(0);
    }

    var j = xbitArray.length - 1;
    for (var i = bits.length - 1; i >= 0; i--) {
        xbitArray[j] = bits[i];
        j--
    }
    return xbitArray;
}

function firstIsGreater(bitArray1, bitArray2, lengthOfArrays) {
    for (var i = 0; i < lengthOfArrays; i++) {
        if (bitArray1[i] ^ bitArray2[i]) {
            if (bitArray1[i]) return true;
            else return false;
        }
    }
    return false;
}

document.getElementById("btnTest").onclick = function (e) {
    runTest();
};

Also, remember that you only have to do this once, when building your BitmapArray (or while taking unions) and then it's going to become pretty efficient for the operations you'd do most often:

Note: N is the length of the BitmapArray.

Add integer to every set: Worst/best case O(N) time. Flip a 0 to 1 in each bitmap.

Remove every set that contains a given integer: Worst case O(N) time.

  • For each bitmap check the bit that represents the given integer, if 1 mark it's index.
  • Compress the array by deleting all marked indices.

If you're okay with just setting the weights to 0 it'll be even more efficient. This also makes it very easy if you want to remove all sets that have any element in a given set.

Union of two maps: Worst case O(N1+N2) time. Just like merging two sorted arrays, except you have to be smart about comparisons once more.

Multiply all of the doubles by a given double: Worst/best case O(N) time. Iterate and multiply each value by the input double.

Iterate over the BitmapArray: Worst/best case O(1) time for next element.

Upvotes: 1

Related Questions