Optimize a c function to remove compare

Question

I have been profiling our application, and found something obvious, the memory allocator is called a lot and consumes significant time (a few percent). Last year I made the memory allocator many times faster, but I still think I can speed it up some more. So as part of that optimization, I want to speed the part of code that quantizes the allocation sizes.

The memory allocator keeps lists of free chunks of memory. There is an array of 832 lists. One list for each allocation size from 0..128k. All allocation requests from 0..128k are converted to one of 832 quanta (is quanta the right word?) 832 is arbitrary and resulted the scheme I came up with below. I was balancing a desire to not waste memory with a desire to have a high amount of reuse. Also, I wanted to use as few bits as possible to store the size of an allocation. In our application, small sizes are requested much more than large sizes - i.e reuse is higher for smaller sizes. Everything is aligned to 8 bytes, so the smallest quanta is 8. I chose to quantize all allocations below 256 bytes to 8 bytes to not waste any more ram than alignment required. Also to save space, when memory is added to a list of free memory, I use the first 8 bytes of the allocated memory for a next pointer, so I can't go below 8 bytes for that reasons too. From 2..8k the request quanta is 32 bytes. From 8..32k it's 128 bytes. From 32..128k it's 512 bytes. As the request size goes up, you can use larger quanta and still keep % of memory you waste low. Because I have only 832 sizes, reuse is high, even for larger/rarer allocations.

Here is the function that quantizes allocation requests. iRecycle is the index into the array of lists. It goes from 0..831

void GetAlignedSize(QWORD cb, QWORD& cbPack8, WORD& iRecycle) {
  // we assume cb is small, so the first 'if' will be hit the most.
  if (cb < 0x000800 - 0x0007) {        //  0k..2k   =   8 byte chunks
    cb += 0x0007; cbPack8 = cb & (~0x0007); // pad to 8
    iRecycle = 000 + WORD(cb >>  3);  
  } else if (cb < 0x002000 - 0x001f) { //  2k..8k   =  32 byte chunks
    cb += 0x001f; cbPack8 = cb & (~0x001f); // pad to 32
    iRecycle = 192 + WORD(cb >>  5);  
  } else if (cb < 0x008000 - 0x007f) { //  8k..32k  = 128 byte chunks
    cb += 0x007f; cbPack8 = cb & (~0x007f); // pad to 128
    iRecycle = 384 + WORD(cb >>  7);  
  } else if (cb < 0x020000 - 0x01ff) { // 32k..128k = 512 byte chunks 
    cb += 0x01ff; cbPack8 = cb & (~0x01ff); // pad to 512
    iRecycle = 576 + WORD(cb >>  9);  
  } else { 
    cbPack8 = Pack8(cb);
    iRecycle = 0;
  }
}

Here is the question! How can I do something similar to this with bit manipulations only. I want to get rid of the compare statement because I think it breaks down cpu pipelining. As long as the quantization increases with size, and the # of sizes below 128k is small, any scheme is viable. I expect this will eliminate the last case and iRecycle will increase without bound, so we can change iRecycle to a different sized integer.

Thanks for your help!

Optimize a c function to remove compare

Answers (1)

Related Questions