RobBain85
RobBain85

Reputation: 77

Reducing bandwidth between GPU and CPU( sending raw data or pre calculate first)

OK so I am just trying to work out the best way reduce band width between the GPU and CPU.

Particle Systems.

Should I be pre calculating most things on the CPU and sending it to the GPU this is includes stuff like positions, rotations, velocity, calculations for alpha and random numbers ect.

Or should I be doing as much as i can in the shaders and using the geometry shader as much as possible.

My problem is that the sort of app that I have written has to have a good few variables sent to the shaders for example, A user at run time will select emitter positions and velocity plus a lot more. The sorts of things that I am not sure how to tackle are things like "if a user wants a random velocity and gives a min and max value to have the random value select from, should this random value be worked out on the CPU and sent as a single value to the GPU or should both the min and max values be sent to the GPU and have a random function generator in the GPU do it? Any comments on reducing bandwidth and optimization are much appreciated.

Upvotes: 2

Views: 852

Answers (2)

SigTerm
SigTerm

Reputation: 26429

Should I be pre calculating most things on the CPU and sending it to the GPU this is includes stuff like positions, rotations, velocity, calculations for alpha and random numbers ect.

Or should I be doing as much as i can in the shaders and using the geometry shader as much as possible.

Impossible to answer. Spend too much CPU time and performance will drop. Spend too much GPU time, performance will drop too. Transfer too much data, performance will drop. So, instead of trying to guess (I don't know what app you're writing, what's your target hardware, etc. Hell, you didn't even specify your target api and platform) measure/profile and select optimal method. PROFILE instead of trying to guess the performance. There are AQTime 7 Standard, gprof, and NVPerfKit for that (plus many other tools).

Do you actually have performance problem in your application? If you don't have any performance problems, then don't do anything. Do you have, say ten million particles per frame in real time? If not, there's little reason to worry, since a 600mhz cpu was capable of handling thousand of them easily 7 years ago. On other hand, if you have, say, dynamic 3d environmnet and particles must interact with it (bounce), then doing it all on GPU will be MUCH harder.

Anyway, to me it sounds like you don't have to optimize anything and there's no actual NEED to optimize. So the best idea would be to concentrate on some other things.

However, in any case, ensure that you're using correct way to transfer "dynamic" data that is frequently updated. In directX that meant using dynamic write-only vertex buffers with D3DLOCK_DISCARD|D3DLOCK_NOOVERWRITE. With OpenGL that'll probably mean using STREAM or DYNAMIC bufferdata with DRAW access. That should be sufficient to avoid major performance hits.

Upvotes: 3

terriblememory
terriblememory

Reputation: 1786

There's no single right answer to this. Here are some things that might help you make up your mind:

  1. Are you sure the volume of data going over the bus is high enough to be a problem? You might want to do the math and see how much data there is per second vs. what's available on the target hardware.
  2. Is the application likely to be CPU bound or GPU bound? If it's already GPU bound there's no point loading it up further.
  3. Particle systems are pretty easy to implement on the CPU and will run on any hardware. A GPU implementation that supports nontrivial particle systems will be more complex and limited to hardware that supports the required functionality (e.g. stream out and an API that gives access to it.)
  4. Consider a mixed approach. Can you split the particle systems into low complexity, high bandwidth particle systems implemented on the GPU and high complexity, low bandwidth systems implemented on the CPU?

All that said, I think I would start with a CPU implementation and move some of the work to the GPU if it proves necessary and feasible.

Upvotes: 2

Related Questions