Reputation: 385
I'd like to improve performance of my Dynamic Linked Library (DLL).
For that I want to use lookup tables of cos() and sin() as I use a lot of them.
As I want maximum performance, I want to create a table from 0 to 2PI that contains the resulting cos and sin computations.
For a good result in term of precision, I think tables of 1 mb for each function is a good trade between size and precision.
I would like to know how to create and uses these tables without using an external file (as it is a DLL) : I want to keep everything within one file.
Also I don't want to compute the sin and cos function when the plugin starts : they have to be computed once and put in a standard vector.
But how do I do that in C++?
EDIT1: code from jons34yp is very good to create the vector files.
I did a small benchmark and found that if you need good precision and good speed you can do a 250000 units vector and linear interpolate between them you will have a 7.89E-11 max error (!) and it is the fastest between all the approximations I tried (and it is more than 12x faster than sin() (13,296 x faster exactly)
Upvotes: 3
Views: 7991
Reputation: 114511
You can use precomputation instead of storing them already precomputed in the executable:
double precomputed_sin[65536];
struct table_filler {
table_filler() {
for (int i=0; i<65536; i++) {
precomputed_sin[i] = sin(i*2*3.141592654/65536);
}
}
} table_filler_instance;
This way the table is computed just once at program startup and it's still at a fixed memory address. After that tsin
and tcos
can be implemented inline as
inline double tsin(int x) { return precomputed_sin[x & 65535]; }
inline double tcos(int x) { return precomputed_sin[(x + 16384) & 65535]; }
Upvotes: 3
Reputation: 153919
The usual answer to this sort of question is to write a small
program which generates a C++ source file with the values in
a table, and compile it into your DLL. If you're thinking of
tables with 128000 entries (128000 doubles are 1MB), however,
you might run up against some internal limits in your compiler.
In that case, you might consider writing the values out to
a file as a memory dump, and mmap
ing this file when you load
the DLL. (Under windows, I think you could even put this second
file into a second stream of your DLL file, so you wouldn't have
to distribute a second file.)
Upvotes: 0
Reputation:
Easiest solution is to write a separate program that creates a .cc
file with definition of your vector.
For example:
#include <iostream>
#include <cmath>
int main()
{
std::ofstream out("values.cc");
out << "#include \"static_values.h\"\n";
out << "#include <vector>\n";
out << "std::vector<float> pi_values = {\n";
out << std::precision(10);
// We only need to compute the range from 0 to PI/2, and use trigonometric
// transformations for values outside this range.
double range = 3.141529 / 2;
unsigned num_results = 250000;
for (unsigned i = 0; i < num_results; i++) {
double value = (range / num_results) * i;
double res = std::sin(value);
out << " " << res << ",\n";
}
out << "};\n"
out.close();
}
Note that this is unlikely to improve performance, since a table of this size probably won't fit in your L2 cache. This means a large percentage of trigonometric computations will need to access RAM; each such access costs roughly several hundreds of CPU cycles.
By the way, have you looked at approximate SSE SIMD trigonometric libraries. This looks like a good use case for them.
Upvotes: 3