Reputation: 2907
Whats the easiest way to implement a lookup table that checks to see if a character is an alpha or not using a lookup table with an array 256 chars (256 bytes)? I know I can use isalpha function, but a lookup table can supposedly be more efficient, requiring one comparison instead of multiple ones. I was thinking of corresponding the index with the char decimal conversion and checking directly if it the char was equivalent to that.
Upvotes: 1
Views: 2776
Reputation: 5344
I've always used this single compare method (i assume it pipelines better), since its faster than doing up to four compares.
unsigned((ch&(~(1<<5))) - 'A') <= 'Z' - 'A'
I benchmarked a few different ways and took into account TLB cache miss for lookup table method. I ran the benchmarks on windows. Here are the times if the charset was '0'..'z':
lookup tbl no tlb miss: 4.8265 ms
lookup table with tlb miss: 7.0217 ms
unsigned((ch&(~(1<<5))) - 'A') <= 'Z' - 'A': 10.5075 ms
(ch>='A' && ch<='Z') || (ch>='a' && ch<='z'): 17.2290 ms
isalpha: 28.0504 ms
You can clearly see that the locale code has a cost.
Here are the times if the charset was 0..255:
tbl no tlb miss: 12.6833 ms
unsigned((ch&(~(1<<5))) - 'A') <= 'Z' - 'A': 29.2403 ms
(ch>='A' && ch<='Z') || (ch>='a' && ch<='z'): 34.8818 ms
isalpha: 78.0317 ms
tbl with tlb miss: 143.7135 ms
The times are longer because more chars were tested. The # of segments I used for the tlb "flush" was larger in the second test. It might be that the table lookup method suffers more from the tlb miss than the first run indicates. You can also see that the single cmp method works better when the character is an alpha.
The lookup table method is the best if comparing many characters in a row, but it is not that much better than the single cmp method. If you are comparing characters here and there, then the tlb cache miss might make the tbl method worse than the single cmp method. The single cmp method works best when the characters are more likely to be alphas.
Here is the code:
__forceinline bool isalpha2(BYTE ch) {
return (ch>='A' && ch<='Z') || (ch>='a' && ch<='z');
}
__forceinline bool isalpha1(BYTE ch) {
return unsigned((ch&(~(1<<5))) - 'A') <= 'Z' - 'A';
}
bool* aTbl[256];
int main(int argc, char* argv[])
{
__int64 n = 0, cTries = 100000;
int b=63;
int ch0=' ', ch1 ='z'+1;
ch0=0, ch1 = 256;
// having 255 tables instead of one lets us "flush" the tlb.
// Intel tlb should have about ~32 entries (depending on model!) in it,
// so using more than 32 tables should have a noticable effect.
for (int i1=0 ; i1<256 ; ++i1) {
aTbl[i1] = (bool*)malloc(16384);
for (int i=0 ; i<256 ; ++i)
aTbl[i1][i] = isalpha(i);
}
{ CBench Bench("tbl with tlb miss");
for (int i=1 ; i<cTries ; ++i) {
for (int ch = ch0 ; ch < ch1 ; ++ ch)
n += aTbl[ch][ch]; // tlb buster
}
}
{ CBench Bench("tbl no tlb miss");
for (int i=1 ; i<cTries ; ++i) {
for (int ch = ch0 ; ch < ch1 ; ++ ch)
n += aTbl[0][ch];
}
}
{ CBench Bench("isalpha");
for (int i=1 ; i<cTries ; ++i) {
for (int ch = ch0 ; ch < ch1 ; ++ ch)
n += isalpha(ch);
}
}
{ CBench Bench("unsigned((ch&(~(1<<5))) - 'A') <= 'Z' - 'A'");
for (int i=1 ; i<cTries ; ++i) {
for (int ch = ch0 ; ch < ch1 ; ++ ch)
n += isalpha1(ch);
}
}
{ CBench Bench("(ch>='A' && ch<='Z') || (ch>='a' && ch<='z')");
for (int i=1 ; i<cTries ; ++i) {
for (int ch = ch0 ; ch < ch1 ; ++ ch)
n += isalpha2(ch);
}
}
return n;
}
class CBench {
public:
__declspec(noinline) CBench(CBench* p) : m_fAccumulate(false), m_nTicks(0),
m_cCalls(0), m_pBench(p), m_desc(NULL), m_nStart(GetBenchMark()) { }
__declspec(noinline) CBench(const char *desc_in, bool fAccumulate=false) :
m_fAccumulate(fAccumulate), m_nTicks(0), m_cCalls(0), m_pBench(NULL),
m_desc(desc_in), m_nStart(GetBenchMark()) { }
__declspec(noinline) ~CBench() {
__int64 n = (m_fAccumulate) ? m_nTicks : GetBenchMark() - m_nStart;
if (m_pBench) {
m_pBench->m_nTicks += n;
m_pBench->m_cCalls++;
return;
} else if (!m_fAccumulate) {
m_cCalls++;
}
__int64 nFreq;
QueryPerformanceFrequency((LARGE_INTEGER*)&nFreq);
double ms = ((double)n * 1000)/nFreq;
printf("%s took: %.4f ms, calls: %d, avg:%f\n", m_desc, ms, m_cCalls,
ms/m_cCalls);
}
__declspec(noinline) __int64 GetBenchMark(void) {
__int64 nBenchMark;
QueryPerformanceCounter((LARGE_INTEGER*)&nBenchMark);
return nBenchMark;
}
LPCSTR m_desc;
__int64 m_nStart, m_nTicks;
DWORD m_cCalls;
bool m_fAccumulate;
CBench* m_pBench;
};
Upvotes: 4
Reputation: 2611
I think you can implement isalpha a lot more trivially than with a lookup table. Using the fact that characters 'a'-'z' and 'A'-'Z' are successive in ASCII a simple test like this is sufficient:
char c ;
// c gets some value
if(('A'<=c && 'Z'>=c) || ('a'<=c && 'z'>=c)) // c is alpha
Note that this doesn't take into account different locales.
Upvotes: 0
Reputation: 254471
Remember the first rule of optimisation: don't do it.
Then remember the second rule of optimisation, to be applied very rarely: don't do it yet.
Then, if you really are encountering a bottleneck and you've identified isalpha
as the cause, then something like this might be faster, depending on how your library implements the function. You'll need to measure the performance in your environment, and only use it if there really is a measurable improvement. This assumes you don't need to test values outside the range of unsigned char
(typically 0...255); you'll need a bit of extra work for that.
#include <cctype>
#include <climits>
class IsAlpha
{
public:
IsAlpha()
{
for (int i = 0; i <= UCHAR_MAX; ++i)
table[i] = std::isalpha(i);
}
bool operator()(unsigned char i) const {return table[i];}
private:
bool table[UCHAR_MAX+1];
};
Usage:
IsAlpha isalpha;
for (int i = 0; i <= UCHAR_MAX; ++i)
assert(isalpha(i) == bool(std::isalpha(i)));
Upvotes: 3
Reputation: 340218
Your compiler library's implementation is likely to be quite efficient and is probably already using a lookup table for most cases, but also handling some situations that might be a little tricky to get right if you're going to do your own isalpha()
:
You might not need to handle non-ASCII locales, in which case you might (maybe) be able to improve slightly over the library.
In fact, I wouldn't be surprised if a macro or function that simply returned the result of:
((('a' <= (c)) && ((c) <= 'z')) || (('A' <= (c)) && ((c) <= 'Z')))
might be faster than a table lookup since it wouldn't have to hit memory. But I doubt it would be faster in any meaningful way, and would be difficult to measure a difference except maybe in a benchmark that did nothing but isalpha()
calls (which might also improve the table lookup results since the table would likely be in the cache for many of the tests).
And is isalpha()
really a bottleneck? For anyone?
Just use the one in your compiler's library.
Upvotes: 3
Reputation: 4779
If you are looking for alphabetic characters, a-Z, that's a lot less characters than your 255 array. You can subtract 'A' from the ASCII character in question (the lowest alphabetic character), which will be the index into your array. e.g. 'B' - 'A' is 1. The test, if negative, is not alpha. If greater than your max alpha ('z'), then it is not alpha.
If you are using unicode at all, this method will not work.
Upvotes: 0
Reputation: 182649
Actually, according to Plauger in "The Standard C Library" [91] isalpha
is oftentimes implemented using a lookup table. That book is really dated but this might still be the case today. Here's his proposed definition for isalpha
Function
int isalpha(int c)
{
return (_Ctype[c] & (_LO|_UP|_XA));
}
Macro
#define isalpha(c) (_Ctype[(int)(c)] & (_LO|_UP|_XA))
Upvotes: 3