none
none

Reputation: 39

How can I predict how HLSL shaders affect performance?

For instance,if I render a flat surface,I get 1500 frames per second.With a simple blend map the frame rate drops to 700(double frame rate drop?!).In the Microsoft DirectX SDK example ''Parallax Occlusion Mapping'' the model is a small,no more than 200 vertex circle,yet the framerate drops to 400.The confusion for me is,that games like Crysis combine all the latest and heaviest shader effects(tessellation is way heavier than parallax mapping),yeat I can dish out 50 FPS on ultra settings.That's 6 times less than the SDK example,but Crysis renders hundreds of thousands of vertices on screen + times more effects than the SDK example.

Upvotes: 0

Views: 1114

Answers (2)

mattnewport
mattnewport

Reputation: 14077

Looking at frame rates in a simple sample like this is not a very meaningful performance measure. DirectX is intended to be efficient for game-like workloads where you render a significant amount of geometry per frame at around 60 fps, not for synthetic tests where you render very simple geometry at 100s or 1000s of fps. You can't extrapolate anything terribly useful about the relative performance of rendering particular objects based on simply measuring frame rates in isolation like this. The software and hardware are designed for maximum throughput for a 16.6 ms frame, not for minimum latency.

When looking at DirectX performance you need to remember that a lot is going on in parallel. In a typical game the CPU will be issuing draw commands for frame n while the GPU is rendering frame n-1. The GPU pipeline may be shading pixels for a triangle from one draw call while rasterizing triangles from a different call and processing vertices from yet another call all on different hardware units working concurrently. Looking at the frame rate for rendering something simple in isolation you are not making efficient use of the hardware. Much of the hardware will be sitting idle for significant periods of time. The hardware is designed for maximum throughput when working on lots of things at once and much of its power is wasted when you only give it one or a few things to work on at a time.

As you note, modern games like Crysis manage to render very complex scenes at 60 fps on high-end GPUs. If you have a particular scene you are trying to render or effect you are trying to create on one of those GPUs it will very likely be fast enough for your needs without you making any great optimization efforts. If you actually reach a point where performance becomes an issue for your scene then there are performance tools available like PIX, nvPerfHUD, etc. that you can use to track down and optimize the performance bottlenecks in your scene.

Upvotes: 1

Ani
Ani

Reputation: 10906

To put it quite simply - optimized game engines push as much data while doing as little work per frame as possible. IMO, the drop in framerate for the sample is not a reflection of how hard the shader/code is working the GPU/CPU but also that the sample is not as optimized (it is meant to be readable - not ultra-fast) as the engines you compare to.

It is wrong to directly compare tessellation and pixel-pushing. Modern game engines often generate a great deal of the detail you can see on the GPU itself which is good for framerates. Also - when you're in a game world you have no idea if that "highly" tesselated rock you can see is a carefully crafted illusion or if, indeed it is made up of many polygons at all.

If you were to tweak things around, I bet the DX sample can be made super fast too. But it will no longer be as readable - nor a pure illustration of the technique at hand.

Also, note that adding tasks does not linearly increase the amount of time taken. Try rendering two surfaces. How much does the framerate drop?

Upvotes: 1

Related Questions