Reputation: 31146
The question is
Are there any other (and/or faster) implementations of a basic 2log?
Applications
The log2(int) and log2(float) operations are very useful in a lot of different contexts. To name a few: compression algorithms, 3d engines and machine learning. In almost all of these contexts they are used in the low-level code that is called billions of times... Especially the log2(int) operation is very useful.
Because I find myself using log2 all the time, I don't want to give a specific application I'm working on here. What is the same is the fact that this is a real performance drainer (as shown by performance tests of various applications). For me it's key to get this as fast as possible.
The complete source code to test all implementations is added at the bottom, so you can see for yourself.
And of course... run your tests at least 3 times and make sure the counters are big enough to hit multiple seconds. Also I do the 'add' operation to ensure the whole loop isn't magically removed by the JIT'ter. So let's get started with the real work.
Trivial implementation
The trivial implementation of a 2log in C# is:
(int)(Math.Log(x) / Math.Log(2))
This implementation is trivial, but also very slow. It requires 2 Log operations, that are in itself quite slow already. Of course, we can optimize this by making 1.0/Math.Log(2)
a constant.
Note that we need to modify this constant a bit to get the right results (as a result of floating point errors) or add a small number to get the correct results. I chose the latter, but it doesn't really matter - the end result is slow in all cases.
Table lookup
A faster solution for this is to use a lookup table. While you can use a lookup table of any power of 2, I usually use a table size of 256 or 64K entries.
First we create the lookup table:
lookup = new int[256];
for (int i = 1; i < 256; ++i)
{
lookup[i] = (int)(Math.Log(i) / Math.Log(2));
}
Next, we implement the 2log as follows:
private static int LogLookup(int i)
{
if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
else if (i >= 0x100) { return lookup[i >> 8] + 8; }
else { return lookup[i]; }
}
As you can see, table lookups are a much, much faster implementation - but as a con it cannot be used to calculate log2(float)
.
Branch removal
As we all know, processors aren't very good at branching, so I figured that table lookups can be improved by removing the branches. Instead of the bunches of if's I introduced a second table with the values and shift bits around to find the entry in the table:
nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };
private static int LogDoubleLookup(int i)
{
int n = (i | (i >> 4));
n = (n | (n >> 2));
n = (n | (n >> 1));
n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
int br = nobranch[n];
return lookup[i >> br] + br;
}
If you run this test, you will find that it is actually slower than the if-then-else solution.
And then there was the Intel 80386
Intel understood years ago that this is an important operation, so they implemented Bit-Scan-Forward (BSF) into their processors. Other processors have similar instructions. This is by far the fastest way to do a 2log that I know of - but unfortunately I know of now way to use these nice functions from C#... I don't like the idea of having an implementation that doesn't run anymore when a new tablet or phone hits the market - and I don't know of any cross-platform solution that enables me to use this function directly.
Other implementations
As l4V pointed out (thanks!) there are a couple of other implementations, specifically:
TestTrivial
.TestFloat
TestDeBruijn
TestBinary
Apart that I like the name, the DeBruijn lookup tables are just as fast as the normal lookup tables, making it one of the fastest algorithms here... all the other algorithms I've tried are much slower.
Complete test code
public class Log2Test
{
public static void TestNaive()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += (int)(Math.Log(i) / Math.Log(2.0));
}
sw.Stop();
Console.WriteLine("Result: {0} - naive implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
public static int LogTrivialLoop(int v)
{
int r = 0;
while ((v >>= 1) > 0) // unroll for more speed...
{
r++;
}
return r;
}
public static void TestTrivialLoop()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogTrivialLoop(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - loop implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
public static int LogFloat(int v)
{
Helper h = new Helper() { U1 = v, U2 = 0x43300000 };
h.D -= 4503599627370496.0;
return (h.U2 >> 20) - 0x3FF;
}
public static void TestFloat()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogFloat(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - IEEE float implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
[StructLayout(LayoutKind.Explicit)]
private struct Helper
{
[FieldOffset(0)]
public int U1;
[FieldOffset(4)]
public int U2;
[FieldOffset(0)]
public double D;
}
public static void TestConstant()
{
double c = 1.0 / Math.Log(2.0);
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += (int)(0.00000000001 + Math.Log(i) * c);
}
sw.Stop();
Console.WriteLine("Result: {0} - naive 2 implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
private static int LogLookup(int i)
{
if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
else if (i >= 0x100) { return lookup[i >> 8] + 8; }
else { return lookup[i]; }
}
public static void TestLookup()
{
lookup = new int[256];
for (int i = 1; i < 256; ++i)
{
lookup[i] = (int)(Math.Log(i) / Math.Log(2));
}
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogLookup(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
private static int LogDoubleLookup(int i)
{
int n = (i | (i >> 4));
n = (n | (n >> 2));
n = (n | (n >> 1));
n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
int br = nobranch[n];
return lookup[i >> br] + br;
}
public static void TestDoubleLookup()
{
// Lookup table was already constructed earlier
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogDoubleLookup(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - double table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
private static int LogBinary(int v)
{
/* This is the worst implementation ever... - apparently C# is a slow-branching language
int[] b = { 0x2, 0xC, 0xF0, 0xFF00, 0x7FFF0000 };
int[] S = { 1, 2, 4, 8, 16 };
int r = 0; // result of log2(v) will go here
for (int i = 4; i >= 0; i--) // unroll for speed...
{
if ((v & b[i]) != 0)
{
v >>= S[i];
r |= S[i];
}
}
return r;
*/
int r = (((v > 0xFFFF)) ? 0x10 : 0);
v >>= r;
int shift = ((v > 0xFF) ? 0x8 : 0);
v >>= shift;
r |= shift;
shift = ((v > 0xF) ? 0x4 : 0);
v >>= shift;
r |= shift;
shift = ((v > 0x3) ? 0x2 : 0);
v >>= shift;
r |= shift;
r |= (v >> 1);
return r;
}
public static void TestBinary()
{
// Lookup table was already constructed earlier
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogBinary(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - binary search implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
private static readonly int[] MultiplyDeBruijnBitPosition = new int[32]
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
private static int LogDeBruijn(int v)
{
v |= v >> 1; // first round down to one less than a power of 2
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return MultiplyDeBruijnBitPosition[(uint)(v * 0x07C4ACDDU) >> 27];
}
public static void TestDeBruijn()
{
// Lookup table was already constructed earlier
Stopwatch sw = new Stopwatch();
sw.Start();
int n = 0;
for (int i = 1; i < 100000000; ++i)
{
n += LogDeBruijn(i);
}
sw.Stop();
Console.WriteLine("Result: {0} - de Bruijn implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
}
private static int[] lookup;
private static readonly int[] nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };
static void Main(string[] args)
{
TestConstant();
TestNaive();
TestDeBruijn();
TestBinary();
TestFloat();
TestTrivialLoop();
TestLookup();
TestDoubleLookup();
Console.ReadLine();
}
}
Upvotes: 29
Views: 16419
Reputation: 3820
There have been quite a few answers providing fast approximate approaches to log2(int)
but few for log2(float)
, so here's two (Java implementation given) that use both a lookup table and mantissa/bit hacking:
/**
* Calculate the logarithm to base 2, handling special cases.
*/
public static float log2(float x) {
final int bits = Float.floatToRawIntBits(x);
final int e = (bits >> 23) & 0xff;
final int m = (bits & 0x7fffff);
if (e == 255) {
if (m != 0) {
return Float.NaN;
}
return ((bits >> 31) != 0) ? Float.NaN : Float.POSITIVE_INFINITY;
}
if ((bits >> 31) != 0) {
return (e == 0 && m == 0) ? Float.NEGATIVE_INFINITY : Float.NaN;
}
return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}
Note:
/**
* Calculate the logarithm using base 2. Requires the argument be finite and
* positive.
*/
public static float fastLog2(float x) {
final int bits = Float.floatToRawIntBits(x);
final int e = (bits >> 23) & 0xff;
final int m = (bits & 0x7fffff);
return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}
This second method forgoes the checking present in the other method and therefore has the following special cases:
Both methods upon rely on a lookup table data
(and variables q
and qm1
). These are populated with the following method. n
defines the accuracy-space tradeoff.
static int q, qm1;
static float[] data;
/**
* Compute lookup table for a given base table size.
*
* @param n The number of bits to keep from the mantissa. Table storage =
* 2^(n+1) * 4 bytes, e.g. 64Kb for n=13. Must be in the range
* 0<=n<=23
*/
public static void populateLUT(int n) {
final int size = 1 << (n + 1);
q = 23 - n;
qm1 = q - 1;
data = new float[size];
for (int i = 0; i < size; i++) {
data[i] = (float) (Math.log(i << q) / Math.log(2)) - 150;
}
}
populateLUT(12); // call once, globally
log2(6666); // = 12.702606
Upvotes: 3
Reputation: 8867
Another log2(int) function: (no longer the fastest)
[StructLayout(LayoutKind.Explicit)]
private struct ConverterStruct
{
[FieldOffset(0)] public int asInt;
[FieldOffset(0)] public float asFloat;
}
public static int Log2(uint val)
{
ConverterStruct a; a.asInt = 0; a.asFloat = val;
return ((a.asInt >> 23 )+ 1) & 0x1F;
}
Notes: The inspiration for using the exponent in a float is from SPWorley 3/22/2009. Use with caution on production code since this would fail on architectures that are not little-endianness.
If you want something "endianness" safe then check out spender 5/3/2012. It also has Zero support.
The fastest is the new built-in BitOperations.Log2(x)
Here are some benchmarks: (code here: https://github.com/SunsetQuest/Fast-Integer-Log2)
Function Time1 Full-32-Bit Zero? FUNCTION
BitOperationsLog2 2 Yes Yes BitOperations.Log2(x);
LeadingZeroCount 2 Yes Yes 31 - BitOperations.LeadingZeroCount(x);
Log2_SunsetQuest5 16 Yes No ((BitConverter.DoubleToInt64Bits(val)>>52)+1) & 0xFF;
Log2_WiegleyJ 17 Yes Yes ...
MostSigBit_spender 17 Yes Yes ...
Log2_SPWorley 17 Yes Yes ...
Log2_SunsetQuest4 18 Yes No ...
FloorLg2_Matthew_Watson 18 Yes Yes ...
Log2_SunsetQuest3 19 Yes No ...
Log2_SunsetQuest1 20 Yes Yes ...
Log2_HarrySvensson 20 Yes Yes ...
Log2_DanielSig 21 No Yes ...
HighestBitUnrolled_Kaz 25 Yes Yes ...
FloorLog2_SN17 36 Yes Yes ...
Log2_Papayaved 44 Yes Yes ...
GetMsb_user3177100 45 Yes Yes ...
Log2_Flynn1179 57 Yes Yes ...
Msb_Protagonist 63 Yes Yes ...
SomeOtherMethod 76 Yes Yes ...
Log2_SunsetQuest0 98 Yes Yes ...
Log2_SunsetQuest2 131 Yes Yes ...
SomeOtherMethod 202 Yes Yes ...
SomeOtherMethod 545 Yes Yes ...
Zero_Support = Supports Neg Return on Zero
Full-32-Bit = Supports full 32-bit (some just support 31 bits)
SomeOtherMethod = name of function/person left out on purpose
Benchmark notes: AMD Ryzen, Release, no-debugger attached, .net 6.0
Upvotes: 2
Reputation: 8867
clean reliable and fast!
(requires .net core 3 or later)
int val = BitOperations.Log2(x);
Upvotes: 3
Reputation: 28839
(I haven't done any measurements so this may not match up, but I thought user user9337139's idea was neat and wanted to try the same in C# - his is C++).
Here's a C# int Magnitude(byte)
function based on converting the byte value to float and extracting the exponent from the IEEE float representation.
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
struct UnionWorker
{
[FieldOffset(0)]
public int i;
[FieldOffset(0)]
public float f;
}
static int Magnitude(byte b)
{
UnionWorker u;
u.i = 0; // just to please the compiler
u.f = b;
return Math.Max((u.i >> 23) & 0xFF, 126) - 126;
}
Returns zero for zero, 8 for 0xFF, other values as you would expect.
Zero is a special case, so I needed the Math.Max
clamp for that. I suspect user9337139's solution might have a similar problem.
Note, this has not been tested for endianness issues - caveat emptor.
Upvotes: 0
Reputation: 1
static byte FloorLog2(UInt16 value)
{
for (byte i = 0; i < 15; ++i)
{
if ((value >>= 1) < 1)
{
return i;
}
}
return 15;
}
Upvotes: 0
Reputation: 21
inline int fast_log2(register double x)
{
return (reinterpret_cast<uint64_t&>(x) >> 52) - 1023;
};
Upvotes: 2
Reputation: 51
Took the binary solution already mentioned and removed the branching. Did some testing and it turned out to be 1.3 times faster than DeBruijn.
public static int Log2(int v)
{
int r = 0xFFFF - v >> 31 & 0x10;
v >>= r;
int shift = 0xFF - v >> 31 & 0x8;
v >>= shift;
r |= shift;
shift = 0xF - v >> 31 & 0x4;
v >>= shift;
r |= shift;
shift = 0x3 - v >> 31 & 0x2;
v >>= shift;
r |= shift;
r |= (v >> 1);
return r;
}
Upvotes: 5
Reputation: 51903
For more algorithms look here http://www.asmcommunity.net/forums/topic/?id=15010
Also did some testing in C++ and my implementation of BSR is slower than lookup table
code:
//---------------------------------------------------------------------------
DWORD log2_slow(const DWORD &x)
{
DWORD m,i;
if (!x) return 0;
if (x>=0x80000000) return 31;
for (m=1,i=0;m<x;m<<=1,i++);
if (m!=x) i--;
return i;
}
//---------------------------------------------------------------------------
DWORD log2_asm(const DWORD &x)
{
DWORD xx=x;
asm {
mov eax,xx
bsr eax,eax;
mov xx,eax;
}
return xx;
}
//---------------------------------------------------------------------------
BYTE _log2[2048]=
{
0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
};
DWORD log2(const DWORD &x)
{
if (x>=0x00400000) return _log2[x>>22]+22;
else if (x>=0x00000800) return _log2[x>>11]+11;
else return _log2[x];
}
//---------------------------------------------------------------------------
test code:
DWORD x,j,i,n=256;
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2 (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_asm (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_slow(j<<i); tend(); mm_log->Lines->Add(tstr(1));
my results on AMD A8-5500 3.2 GHz:
[ 0.040 ms] log2 (x) - 11bit lookup table
[ 0.060 ms] log2_asm (x) - BSR
[ 0.415 ms] log2_slow(x) - shift loop
Note:
Upvotes: 2
Reputation: 109792
There are some integer algorithms here.
In C#:
public static uint FloorLog2(uint x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return (uint)(NumBitsSet(x) - 1);
}
public static uint CeilingLog2(uint x)
{
int y = (int)(x & (x - 1));
y |= -y;
y >>= (WORDBITS - 1);
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return (uint)(NumBitsSet(x) - 1 - y);
}
public static int NumBitsSet(uint x)
{
x -= ((x >> 1) & 0x55555555);
x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
x = (((x >> 4) + x) & 0x0f0f0f0f);
x += (x >> 8);
x += (x >> 16);
return (int)(x & 0x0000003f);
}
private const int WORDBITS = 32;
You should look at the original code on the site I linked for the context, particularly what happens with Log2(0).
Upvotes: 3