npengra317
npengra317

Reputation: 126

Bitwise operator on two large sparse vectors without looping?

I have three large arbitrary sparse boolean vectors, all of the same size - say: pool1, pool2, intersection_of_other_pools. I'm interested in performing a bitwise operator. It'd be great if I could do: intersection_of_other_pools |= pool1 | pool2, but that doesn't seem to be an option - as far as I could find.

Since the size of all these vectors are very large, and pool1 and pool2 are very sparse, I'd be interested in a way to perform a bitwise operation on these vectors without looping. I understand that the under-the-hood implementation of std::vector<bool> is just an array of bits, which led me to believe it's possible to do this without looping.

I'm open to strange bitwise hacky solutions in the name of speed.

Of course, if the fastest way (or the only way) to do this is just looping, then I'll happily accept that as an answer as well.

I've checked out valarray as a potential alternative to vector, but I couldn't tell if it looping or is doing some magical bitwise operation. But ideally, I don't want to change the existing codebase.

Upvotes: 0

Views: 272

Answers (2)

skeller
skeller

Reputation: 1161

implement as std::vector<uint64_t>, your cpu will propably be quite fast to do the bitwise "or" on these. They will be memory-aligned, so cache friendly. the loop is not as bad as you think, as there will be a hidden implicit loop anyway on a different data structure.

if its extremly sparse (<< 1 in 1000), then just store the indices of the "set" bits in a (sorted) vector and use std::set_intersection to do the matching

Upvotes: 2

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275750

Don't use std::vector<bool> or similar for a sparse array.

A truly sparse array should have the ability to skip over large sections.

Encode your data as block headers, which state how long a region it is in bytes. Use all 1s in the length to say "the length of the length field is twice as long and follows", recursively.

So 0xFF0100 states that there is a block of length 512 following. (you can do a bit better by not permitting 0 nor 1-254, but that is rounding error).

Alternate blocks of "all 0s" with blocks of mixed 1s and 0s.

Don't read the block headers directly; use memcpy into aligned storage.

Once you have this, your | or & operation is more of a stitch than a bitwise operation. Only in the rare case where both have a non-zero block do you actually do bitwise work.

After doing & you might want to check if any of the non-0 regions are actually all 0.

This assumes an extremely sparse bitfield. Like, 1 bit in every few 10000 is set is a typical case. If by sparse you mean "1 in 10", then just use a vector of uint64_t or something.

Upvotes: 2

Related Questions