Reputation: 128
Let's say I have n bitmasks (n is arbitrary, probably a number from 9-25). The length of these bitmasks (which is equal for all of them) does not matter, but for the sake of an example, let's say we have 3 bitmasks of length 3:
b1: 1 0 0
b2: 1 1 1
b3: 1 0 1
I now want to map those 3 bitmasks (or n in a general case) to 4 (n + 1) other bitmasks, one for each possible number of occurrences. For this example, the resulting bitmasks should be:
0: 0 0 0
1: 0 1 0
2: 0 0 1
3: 1 0 0
Now the question: Does anybody know a clever strategy to achieve this? The easiest attempt to do this is obviously going through each bit and counting the number of occurrences in the original bitmask set. However, I was wondering if there was some clever way of bit manipulation to keep track of all bits at the same time. Creating intermediate bitmasks is fine. The number of operations should of course be below the "naive" way outlined above. Any ideas?
Upvotes: 0
Views: 221
Reputation: 23424
Depending on how many masks you have and how many bits there are in each mask there might be a way to do this with bit twiddling operations. For up to three 6-bits masks, something like this should do it:
let mut counts = 0;
for mask in &masks {
counts += (mask * 0x0101) & 0xAA55;
}
let mut result = [0; 4];
for i in 0..3 {
result[(counts >> (2 * i)) & 0x3] |= 1 << (2 * i);
result[(counts >> (2 * i + 9)) & 0x3] |= 1 << (2 * i + 1);
}
Here's a better solution that works for up to 255 eight bits masks:
let mut counts = 0;
for &mask in &masks {
counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
& 0x0101010101010101;
}
let mut result = vec![0; masks.len() + 1];
for i in 0..8 {
result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
}
Criterion benchmarks show that this solution goes from slightly faster than a naive implementation for a single input mask, to three times as fast for 256 inputs:
Nb inputs | naive | simd | ratio |
---|---|---|---|
1 | 41 ns | 39 ns | 1.1 |
16 | 86 ns | 43 ns | 2.0 |
256 | 737 ns | 232 ns | 3.2 |
Here's the benchmark code for reference:
use criterion::{ black_box, criterion_group, criterion_main, Criterion };
fn naive (masks: impl Iterator<Item=u8>) -> Vec<u8> {
let mut n = 1;
let mut counts = vec![0; 8];
for mask in masks {
for i in 0..8 {
if (mask >> i) & 0x1 != 0 {
counts[i] += 1;
}
}
n +=1;
}
let mut result = vec![0; n];
for i in 0..8 {
result[counts[i]] |= 1 << i;
}
result
}
fn simd (masks: impl Iterator<Item=u8>) -> Vec<u8> {
let mut n = 1;
let mut counts = 0;
for mask in masks {
counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
& 0x0101010101010101;
n += 1;
}
let mut result = vec![0; n];
for i in 0..8 {
result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
}
result
}
pub fn bench_naive (c: &mut Criterion) {
c.bench_function ("naive - 1", |b| b.iter (|| {
black_box (naive (black_box (0..1))); }));
c.bench_function ("naive - 16", |b| b.iter (|| {
black_box (naive (black_box (0..16))); }));
c.bench_function ("naive - 256", |b| b.iter (|| {
black_box (naive (black_box (0..=255))); }));
}
pub fn bench_simd (c: &mut Criterion) {
c.bench_function ("simd - 1", |b| b.iter (|| {
black_box (simd (black_box (0..1))); }));
c.bench_function ("simd - 16", |b| b.iter (|| {
black_box (simd (black_box (0..16))); }));
c.bench_function ("simd - 256", |b| b.iter (|| {
black_box (simd (black_box (0..=255))); }));
}
criterion_group!(benches, bench_naive, bench_simd);
criterion_main!(benches);
Upvotes: 1