Reputation: 41503
So I have megabytes of data stored as doubles that need to be sent over a network... now I don't need the precision that a double offers, so I want to convert these to a float before sending them over the network. What is the overhead of simply doing:
float myFloat = (float)myDouble;
I'll be doing this operation several million times every few seconds and don't want to slow anything down. Thanks
EDIT: My platform is x64 with Visual Studio 2008.
EDIT 2: I have no control over how they are stored.
Upvotes: 10
Views: 11180
Reputation: 41096
As Michael Burr said, while the overhead strongly depends on your platform, the overhead is definitely less than the time needed to send them over the wire.
a rough estimate:
800MBit/s payload on a excellent Gigabit wire, 25M-floats/second.
On a 2GHz single core, that gives you a whopping 80 clock cycles for each value converted to break even - anythign less, and you will save time. That should be more than enough on all architectures :)
A simple load-store cycle (barring all caching delays) should be below 5 cycles per value. With instruction interleaving, SIMD extensions and/or parallelizing on multiple cores, you are likely to do multiple conversions in a single cycle.
Also, the receiver will be happy having to handle only half the data. Remember that memory access time is nonlinear.
The only thing arguing against the conversion would be is if the transfer should have minimal CPU load: a modern architecture could transfer the data from disk/memory to bus without CPU intervention. However, with above numbers I'd say that doesn't matter in practice.
[edit]
I checked some numbers, the 387 coprocessor would indeed have taken around 70 cycles for a load-store cycle. On the initial pentium, you are down to 3 cycles without any parallelization.
So, unless you run a gigabit network on a 386...
Upvotes: 16
Reputation: 264401
Even if it does take time this will not be the slow point in your application.
Your FPU can do the conversion a lot quicker than it can send network traffic (so the bottleneck here will more than likely be the write to the socket).
But as with al things like this measure it and see.
Personally I don't think any time spent here will affect the real time spent sending the data.
Upvotes: 5
Reputation: 36987
I think this cast is a lot cheaper than you think since it doesn't really involve any kind of calculation. In fact, it's just bitshifting to get rid of some of the digits of exponent and mantissa.
Upvotes: 3
Reputation: 12815
Bearing in mind that most compilers deal with doubles
a lot more efficiently than floats
-- many promote float
to double
before performing operations on them -- I'd consider taking the block of data, ZIPping/compressing it, then sending the compressed block across. Depending on what your data looks like, you could get 60-90% compression, vs. the 50% you'd get converting 8-byte values to four bytes.
Upvotes: 3
Reputation: 340208
Assuming that you're talking about a significant number of packets to ship the data (a reasonable assumption if you're sending millions of values) casting the doubles to float will likely reduce the number of network packets by about half (assuming sizeof(double)==8
and sizeof(float)==4
).
Almost certainly the savings in network traffic will dominate whatever time is spent performing the conversion. But as everyone says, measuring some tests will be the proof of the pudding.
Upvotes: 4
Reputation: 8514
It will also depend on the CPU and what floating point support it has. In the bad old days (1980s), processors supported integer operations only. Floating point math had to be emulated in software. A separate chip for floating point (a coprocessor) could be bought separately.
Modern CPUs now have SIMD instructions, so large amounts of floating point data can be processed at once. These instructions include MMX, SSE, 3DNow! and the like. Your compiler may know how to make use of these instructions, but you may need to write your code in a particular way, and turn on the right options.
Finally, the fastest way to process floating point data is in a video card. A fairly new language called OpenCL lets you send tasks to the video card to be processed there.
It all depends on how much performance you need.
Upvotes: 1
Reputation: 3637
You don't have any choice but to measure them yourself and see. You could use timers to measure them. Looks like some has already implemented a neat C++ timer class
Upvotes: 1
Reputation: 8320
It's going to depend on your implementation of C++ libraries. Test it and see.
Upvotes: 7