Lucas Manzke
Lucas Manzke

Reputation: 128

Map set of bitmasks to number of occurrences

Let's say I have n bitmasks (n is arbitrary, probably a number from 9-25). The length of these bitmasks (which is equal for all of them) does not matter, but for the sake of an example, let's say we have 3 bitmasks of length 3:

b1: 1 0 0
b2: 1 1 1
b3: 1 0 1

I now want to map those 3 bitmasks (or n in a general case) to 4 (n + 1) other bitmasks, one for each possible number of occurrences. For this example, the resulting bitmasks should be:

0: 0 0 0
1: 0 1 0
2: 0 0 1
3: 1 0 0

Now the question: Does anybody know a clever strategy to achieve this? The easiest attempt to do this is obviously going through each bit and counting the number of occurrences in the original bitmask set. However, I was wondering if there was some clever way of bit manipulation to keep track of all bits at the same time. Creating intermediate bitmasks is fine. The number of operations should of course be below the "naive" way outlined above. Any ideas?

Upvotes: 0

Views: 221

Answers (1)

Jmb
Jmb

Reputation: 23424

Depending on how many masks you have and how many bits there are in each mask there might be a way to do this with bit twiddling operations. For up to three 6-bits masks, something like this should do it:

let mut counts = 0;
for mask in &masks {
    counts += (mask * 0x0101) & 0xAA55;
}

let mut result = [0; 4];
for i in 0..3 {
    result[(counts >> (2 * i)) & 0x3] |= 1 << (2 * i);
    result[(counts >> (2 * i + 9)) & 0x3] |= 1 << (2 * i + 1);
}

Playground

Edit:

Here's a better solution that works for up to 255 eight bits masks:

let mut counts = 0;
for &mask in &masks {
    counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
            & 0x0101010101010101;
}
    
let mut result = vec![0; masks.len() + 1];
for i in 0..8 {
    result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
}

Playground

Criterion benchmarks show that this solution goes from slightly faster than a naive implementation for a single input mask, to three times as fast for 256 inputs:

Nb inputs naive simd ratio
1 41 ns 39 ns 1.1
16 86 ns 43 ns 2.0
256 737 ns 232 ns 3.2

Here's the benchmark code for reference:

use criterion::{ black_box, criterion_group, criterion_main, Criterion };

fn naive (masks: impl Iterator<Item=u8>) -> Vec<u8> {
   let mut n = 1;
   let mut counts = vec![0; 8];
   for mask in masks {
      for i in 0..8 {
         if (mask >> i) & 0x1 != 0 {
            counts[i] += 1;
         }
      }
      n +=1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[counts[i]] |= 1 << i;
   }

   result
}

fn simd (masks: impl Iterator<Item=u8>) -> Vec<u8> {
   let mut n = 1;
   let mut counts = 0;
   for mask in masks {
      counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
         & 0x0101010101010101;
      n += 1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
   }

   result
}

pub fn bench_naive (c: &mut Criterion) {
   c.bench_function ("naive - 1", |b| b.iter (|| {
      black_box (naive (black_box (0..1))); }));
   c.bench_function ("naive - 16", |b| b.iter (|| {
      black_box (naive (black_box (0..16))); }));
   c.bench_function ("naive - 256", |b| b.iter (|| {
      black_box (naive (black_box (0..=255))); }));
}

pub fn bench_simd (c: &mut Criterion) {
   c.bench_function ("simd - 1", |b| b.iter (|| {
      black_box (simd (black_box (0..1))); }));
   c.bench_function ("simd - 16", |b| b.iter (|| {
      black_box (simd (black_box (0..16))); }));
   c.bench_function ("simd - 256", |b| b.iter (|| {
      black_box (simd (black_box (0..=255))); }));
}

criterion_group!(benches, bench_naive, bench_simd);
criterion_main!(benches);

Upvotes: 1

Related Questions