Reputation: 117
I have run into a problem that I can probably circumvent by arranging my algorithm differently, but it's quite interesting and maybe one of you has a good idea.
The situation is as follows: I have two lists of unsigned long integers, both lists have the same size, and if this is helpful you can assume that this size is a power of two. The size of these lists is usually in the range of several hundred. Now I want to compute an integer that has a set bit in every position in which the first list has more set bits than the second list.
Speed is everything.
Simplified example:
list1 list2
1010 0101
1111 0000
1100 0011
1010 0101
result: 1010
because of 4>0, 2<=2, 3>1, 1<=3
Edit: The alternative arrangement of data would result in bit vectors that contain what is now the bits of a certain position in several different vectors. In that case I could just use a bit counting algorithm and then compare, which would amount to less than 30 operations per 64 bits in both lists. Basically I have a matrix of bits and I can use the bit vectors for the columns or the rows.
Additional structure: John Willemse's comment made me realise that I could calculate a third list, so that these three lists complement each other bitwise. Though I don't see how that would be helpful.
Upvotes: 3
Views: 238
Reputation: 64903
You can do it with transposed counters - instead of having an int for each bit position of the data, an uint for each bit position of the count. Hopefully you don't need too many bits..
You can then do addition/subtraction the way they are defined over bitvectors, with each "bit" really being a slice of that bit position across all counts.
Perhaps this sounds vague, so let's just jump right in: (not tested)
// add in item from list2
carry0 = count0 & item2;
count0 ^= item2;
carry1 = count1 & carry0;
count1 ^= carry0;
.. etc for however many bits you need in your counters
// subtract item from list1
borrow0 = ~count0 & item1;
count0 ^= item1;
borrow1 = ~count1 & borrow0;
count1 ^= borrow0;
.. etc
The result is the signs, so the last counter you're using.
Or, completely different: maybe you can use sub-fields of an int, SWAR style. That only works if the fields are small or you don't need many, because there isn't much space. With 4-bit items it's not so bad, with uint32_t
offering 4 counters that range from -128 to 127, which might be enough (the final difference must be in that range, intermediate results can wrap safely)
Anyway how it would work is that you spread the bits out with either a lookup table or pdep
, (not tested)
uint32_t spread = _pdep_u32(item, 0x01010101);
// or
uint32_t table[] = {
0x00000000, 0x00000001, 0x00000100, 0x00000101,
0x00010000, 0x00010001, 0x00010100, 0x00000101,
0x01000000, 0x01000001, 0x01000100, 0x00000101,
0x01010000, 0x01010001, 0x01010100, 0x01010101 };
uint32_t spread = table[item];
Then do SWAR addition or subtraction, but it can be optimized a bit, because you know they're increments or decrements or no change, (not tested)
// add in spread item 2
uint32_t H = 0x80808080;
count = ((count &~H) + sp2) ^ (count & H);
// subtract spread item 1
count = ((count | H) - sp1) ^ (~count & H);
The result is the sign of every sub-field, which is easy to extract but annoying to compress (unless you have pext
).
Upvotes: 1
Reputation: 3408
It may not be the most efficient, but this is the first solution that springs to mind, which is O(n).
int list1[4] = {10, 15, 12, 10};
int list2[4] = {5, 0, 3, 5};
int i, j;
int result = 0;
int num_bits = 4;
int num_elements = 4;
for (i = num_bits - 1; i >= 0; i--)
{
int bit_pos_ans = 0;
for (j = 0; j < num_elements; j++)
{
/* This works by adding the 1s in list1, and subtracting the 1s in list 2 */
bit_pos_ans += (((list1[j] >> i) & 0x1) - ((list2[j] >> i) & 0x1));
}
/* If there are more 1s in list1 and list2, then this bit position is a 1. */
if (bit_pos_ans > 0)
{
result += 1;
}
/* Only shift if this is not calculating bit position 0 */
if (i > 0)
{
result <<= 1;
}
}
printf("%d", result);
Upvotes: 0