Reputation: 616
I was told that using trigonometric table could be much faster than invoking trigonometric functions like sin and cos when needed because trigonometric functions are expensive by using Taylor expansion and observed that some fortran in-house codes were using trigonometric table.
I have made a sample code to confirm it in C++ and found that using cosinTable is about 2 times faster than invoking cosin function. However, combining cosineTable and a hash function which maps given radian to cosTable index showed much worse performance than invoking cosin function itself.
Seeing that using cosinTable and a hash function is much slower than invoking cosin function and the hash function cost is over 2 times expensive, I am wondering why in-house codes were using table instead of invoking function. Also, I am wondering if the hash function could be improved. Any comments would be appreciated.
#include <cmath>
#include <iostream>
#include <chrono>
const int nTableSize = 1000000;
float cosTable[nTableSize];
const float pi = atan2(1., 1.)*4.;
int hashFunction(float x)
{
return static_cast<int>(fmod(x/(2*pi), 1.)*nTableSize);
}
int main(int argc, char** argv)
{
//build cosine table
for(int i=0; i<nTableSize; ++i)
{
cosTable[i] = cos(i*2*pi/nTableSize);
}
int nComponentWaves = 1000;
std::chrono::system_clock::time_point start = std::chrono::system_clock::now();
float ntheta = 2*pi/nComponentWaves;
float eta = 0;
for(int i=0; i<nComponentWaves; ++i)
{
eta += cos(ntheta*i);
}
std::chrono::system_clock::time_point end = std::chrono::system_clock::now();
std::chrono::nanoseconds duration =
std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "cosine function elapsed time (ns): " << duration.count() << std::endl;
start = std::chrono::system_clock::now();
ntheta = 2*pi/nComponentWaves;
eta = 0;
for(int i=0; i<nComponentWaves; ++i)
{
int index = hashFunction(ntheta*i);
eta += cosTable[index];
/*
eta += cosTable[i];
*/
}
end = std::chrono::system_clock::now();
duration =
std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "cosine table elapsed time (ns): " << duration.count() << std::endl;
return 0;
}
Makefile is as follows:
CXX=g++
CXXFLAGS=-std=c++11 -O3
LDFLAGS=
INCLUDE=-I.
LIBS=-L. -lm
SRC=main.cpp
OBJ=${SRC:%.cpp=%.o}
OUT=main.exe
%.o: %.cpp
${CXX} ${CXXFLAGS} ${INCLUDE} -c $<
all: build
build:${OBJ}
${CXX} ${CXXFLAGS} ${INCLUDE} ${LDFLAGS} ${LIBS} -o ${OUT} $^
clean:
rm ${OBJ} ${OUT}
.PHONY: all build clean
and the result with hash function is
cosine function elapsed time (ns): 56
cosine table elapsed time (ns): 15522
FYI, the result without hash function is
cosine function elapsed time (ns): 77
cosine table elapsed time (ns): 36
Upvotes: 0
Views: 163
Reputation: 131
Just a few ideas for you...
1) Google fmod, there are many links to questions about its performance compared to operator%
2) Thus, consider using operator % on the integer rather than fmod on the float
3) You have a massive table, so accesses are more than likely required to access main memory, especially non-consecutive accesses - consider playing with the table size to fit in local cache sizes. Consider if your accesses are sequential or not. Consider what happens when you access an item not already in the cache.
4) Mathematically, to reduce the table size even more, consider storing just one quadrant of the cos function - you can obtain all 4 quadrants from just the one, thus considerably reducing the table size. Google Range Reduction of trig functions.
5) Also, consider how you break the cosine's range into equal intervals, this is not efficient in terms of storage. That is, you should need fewer items where the cos doesn't change much, and more at the turning points. Consider using equal y values rather than equal x values. Consider using variable intervals based on rates of change of cos.
6) The conception that implementing this in a table should be trivial, is not! Consider using very fast approximating functions for cos, of which there are MANY many examples online.
7) Edited in: Also consider compiler vectorisation of your hash function in a loop. Using fmod will make your loop non-vectorizable. Something else to consider!
Hope this helps
Upvotes: 1