Map set of bitmasks to number of occurrences

Question

Let's say I have n bitmasks (n is arbitrary, probably a number from 9-25). The length of these bitmasks (which is equal for all of them) does not matter, but for the sake of an example, let's say we have 3 bitmasks of length 3:

b1: 1 0 0
b2: 1 1 1
b3: 1 0 1

I now want to map those 3 bitmasks (or n in a general case) to 4 (n + 1) other bitmasks, one for each possible number of occurrences. For this example, the resulting bitmasks should be:

0 has no bits set because no bit in the original bitmasks is set 0 times.
1 has the 2nd bit set because it is set exactly one time in the original set of bitmasks
2 has the 3rd bit set because it is set exactly 2 times in the original set of bitmasks
3 has the 1st bit set because it is set exactly 3 times in the original set of bitmasks

Now the question: Does anybody know a clever strategy to achieve this? The easiest attempt to do this is obviously going through each bit and counting the number of occurrences in the original bitmask set. However, I was wondering if there was some clever way of bit manipulation to keep track of all bits at the same time. Creating intermediate bitmasks is fine. The number of operations should of course be below the "naive" way outlined above. Any ideas?

Jmb · Accepted Answer

Depending on how many masks you have and how many bits there are in each mask there might be a way to do this with bit twiddling operations. For up to three 6-bits masks, something like this should do it:

let mut counts = 0;
for mask in &masks {
    counts += (mask * 0x0101) & 0xAA55;
}

let mut result = [0; 4];
for i in 0..3 {
    result[(counts >> (2 * i)) & 0x3] |= 1 << (2 * i);
    result[(counts >> (2 * i + 9)) & 0x3] |= 1 << (2 * i + 1);
}

Playground

Edit:

Here's a better solution that works for up to 255 eight bits masks:

let mut counts = 0;
for &mask in &masks {
    counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
            & 0x0101010101010101;
}
    
let mut result = vec![0; masks.len() + 1];
for i in 0..8 {
    result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
}

Playground

Criterion benchmarks show that this solution goes from slightly faster than a naive implementation for a single input mask, to three times as fast for 256 inputs:

Nb inputs	naive	simd	ratio
1	41 ns	39 ns	1.1
16	86 ns	43 ns	2.0
256	737 ns	232 ns	3.2

Here's the benchmark code for reference:

use criterion::{ black_box, criterion_group, criterion_main, Criterion };

fn naive (masks: impl Iterator) -> Vec {
   let mut n = 1;
   let mut counts = vec![0; 8];
   for mask in masks {
      for i in 0..8 {
         if (mask >> i) & 0x1 != 0 {
            counts[i] += 1;
         }
      }
      n +=1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[counts[i]] |= 1 << i;
   }

   result
}

fn simd (masks: impl Iterator) -> Vec {
   let mut n = 1;
   let mut counts = 0;
   for mask in masks {
      counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
         & 0x0101010101010101;
      n += 1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
   }

   result
}

pub fn bench_naive (c: &mut Criterion) {
   c.bench_function ("naive - 1", |b| b.iter (|| {
      black_box (naive (black_box (0..1))); }));
   c.bench_function ("naive - 16", |b| b.iter (|| {
      black_box (naive (black_box (0..16))); }));
   c.bench_function ("naive - 256", |b| b.iter (|| {
      black_box (naive (black_box (0..=255))); }));
}

pub fn bench_simd (c: &mut Criterion) {
   c.bench_function ("simd - 1", |b| b.iter (|| {
      black_box (simd (black_box (0..1))); }));
   c.bench_function ("simd - 16", |b| b.iter (|| {
      black_box (simd (black_box (0..16))); }));
   c.bench_function ("simd - 256", |b| b.iter (|| {
      black_box (simd (black_box (0..=255))); }));
}

criterion_group!(benches, bench_naive, bench_simd);
criterion_main!(benches);

Map set of bitmasks to number of occurrences

Answers (1)

Edit:

Related Questions