Put aside OO design pattern to achieve better performance in strategic games?

Question

Suppose we would like to program efficiently for some reasons and constraints. Should we put aside OOP? Let's illustrate with an example

public class CPlayer{
   Vector3 m_position;
   Quaternion m_rotation;
   // other fields
}
public class CPlayerController{
   CPlayer[] players;
   public CPlayerController(int _count){
      players=new CPlayer[_count];
   }
   public void ComputeClosestPlayer(CPlayer _player){
      for(int i=0;i



If we convert the class to a struct, we can utilize to cache the player array in cache memory and get better performance. When we need to iterate the array in ComputeClosestPlayer function, we know that the players structs were stored consecutively so can come into the cache memory when reading the first element of the array.

public struct CPlayerController{
   CPlayer[] players;
   public CPlayerController(int _count){
      players=new CPlayer[_count];
   }
   public void ComputeClosestPlayer(CPlayer _player){
      for(int i=0;i


If we want to achieve more performance, we can separate position field from the class:

public Vector3[] m_positions;


So now, only the positions(12 bytes for every of them) are cached in cache memory when we call the function while in previous approach, we have to cache objects that take more memory.

Finally I do not know that it is a standard approach or you avoid it to separate some fields from a class to get better performance and share your approach to get the most performance in strategy games that you have a lot of items and soldiers

user4842163 · Accepted Answer

Put aside OO design pattern to achieve better performance in strategic games?

I tend to favor this broad approach of putting aside OOP for visual FX at the central architecture and specifically with an entity-component system like so:

... where the components, in blue, are just data (structs with no functionality of their own). If there's any functionality at all inside a component, it's pure data structure functionality (like the functions you find in std::vector in C++ or ArrayList in C#).

It does allow efficient things to be done more easily, but the primary benefit for me wasn't efficiency. It was the flexibility and maintainability. When I need brand new behavior, I can just make a small local modification a system or add a new component or add a new system when the wide dependencies are flowing towards data rather than abstractions. The need to face cascading design changes has been a thing of the past ever since I embraced this approach.

"No Central Design"

Every new design idea, no matter how crazy, tends to be fairly easy to extend and add to the system without breaking any central designs, since there are no central abstract designs in the first place (or a kind that contain functionality) besides the ECS database itself. And with the exception of dependencies to the ECS database and a handful of components (raw data) in each system, the system is uber decoupled.

And it makes every system easy to reason about from everything like thread safety to what side effects go on where and when. Each system performs a very well-defined role that maps very directly to business requirements. It's harder to reason about designs and responsibilities when you have this instead for the communication/interaction between medium and teeny-sized objects:

... and this above graph is not a dependency diagram. In terms of coupling there might be an abstraction between every object to decouple them that way (ex: Modeling object depending on IMesh, not a concrete mesh), but they still talk to each other, and all that interaction and communication can make it difficult to reason about what's going on as well as come up with the most efficient loopy code.

Meanwhile the first system that has each independent system processing data from a central database in a flat pipeline-style fashion makes it very easy to figure out what's going on as well as implement the loopy critical paths of execution very efficiently. It also makes it so you can sit down and work on a system without knowing what the other systems are doing: all you have to do to implement a physics system is read things like motion components from the database and transform the data correctly. You don't need to know much to implement and maintain the physics system besides a few component types and how to fetch them from the ECS "database".

That also makes it easier to work in a team and also hire new developers who can quickly come up to speed without spending 2 years trying to teach them how the central abstractions in the overall system works just so that they can do their job. They can start doing things as bold and as centrally-impacting to the software's design as introducing a brand new physics engine or rendering engine to the system within weeks.

Efficiency

If we convert the class to a struct, we can utilize to cache the player array in cache memory and get better performance.

That's only at a granular level where objects start to get in the way of performance. For example, if you try to represent one single pixel of an image with an abstract Pixel object or IPixel interface that encapsulates and hides its data, then it can very easily become a performance barrier even without considering the cost of dynamic dispatch. Such a granular object just tends to force you to work at the granular level of one pixel at a time while fudging about with its public interface, so optimizations like processing the image on the GPU or SIMD processing on the CPU are out when we have this barrier between ourselves and the pixel data.

Aside from that you generally can't code to an interface and expect efficient solutions at this granular of a level (one pixel). We can't hide away concrete details like pixel formats and abstract them away and expect to write efficient video filters that loop through millions of pixels every frame, e.g. At a low enough level, we have to start writing code against concrete details for reasonable efficiency. Abstractions and coding to an interface are helpful as you work towards high-level operations.

But naturally that doesn't apply if you turn Pixel into just raw data stored in an Image. There an image object, which is really a container of often millions of pixels, isn't a practical barrier to achieving a very efficient solution. We don't have to abandon OOP to write very efficient loops. We just might need to do it for the teeniest, most granular objects that store barely any data of their own.

So one alternative strategy is to just model your objects at a coarser level. You don't have to design a Human class. You can design a Humans class which inherits from Creatures. Now the implementation of Humans might consist of loopy multithreaded SIMD code which processes thousands of humans worth of data (ex: SoA fields stored in parallel arrays) at once.

Alternatives to OOP

The appeal to me of abandoning OOP at the broadest level of the design (I still use OOP plenty for the implementation of each subsystem) in favor of value aggregates at the central level as with the case of components in an ECS is mostly the flexibility to me of leaving the data wide open and accessible through a central "database". It makes it easier to just implement the systems you need and effectively access the data without jumping through hoops and going through layers upon layers of abstractions while fighting against the abstractions you designed.

Of course there are downsides, like that your data now has to be very stable (we can't keep changing it) or else your systems will break. And you also have to have a decent system organization to make it so your data is accessed and modified in the minimum number of places to allow you to effectively maintain invariants, reason about side effects, etc. But I've found an ECS does that beautifully just naturally, since it becomes very easy to tell what systems access what components.

In my case, I found a nice fit using ECS for the large-scale design and then when you zoom in to the implementation of a particular system, like a physics or rendering system, then they use OOP to help implement auxiliary data structures and medium-sized objects to make the system's implementation more comprehensible and things like that. I find OOP very useful at the medium kind of scale of complexity, but sometimes very difficult to maintain and optimize at the largest scale.

Hot/Cold Field Splitting and Value Aggregates

Finally I do not know that it is a standard approach or you avoid it to separate some fields from a class to get better performance and share your approach to get the most performance in strategy games that you have a lot of items and soldiers

This is getting a bit C#-specific and I'm more of a C++ and C programmer, but I believe I've read that C# structs, as long as they aren't boxed, can be stored contiguously in an array. That contiguity you get can make a world of difference in terms of reducing cache misses.

Specifically in GC languages, often the initial set of objects allocations can be done quickly using a sequential allocator ("Eden" space in Java, I believe C# does something similar though I haven't read any papers on C#'s implementation details) for very efficient GC implementations. But after the first GC cycle, the memory can then be shuffled around to allow it to be reclaimed on an individual object basis. That loss of spatial locality can then really hurt performance if you need to do very efficient sequential loops. So storing an array of structs or primitive data types like int or float might be a useful optimization in some key loopy areas of your game.

As for the approach of separating fields, that's useful for SIMD processing and hot/cold field splitting. Hot/cold field splitting is separating data fields frequently-accessed away from others which aren't. For example, a particle system might spend the bulk of its time moving particles around and detecting if they collide. It has absolutely no interest in things like a particle's color in that case during the critical paths.

So an effective optimization in that case might be to avoid storing color directly inside a particle, and instead hoist it out and store it in its own separate, parallel array. This way the hot data that is constantly accessed can be loaded into a 64-byte cache line without irrelevant data, like color, being loaded into it needlessly and slowing down the critical paths by making them have to plow through more irrelevant data to get at the relevant data.

All non-trivial optimizations tend to boil down to exchanges that skew performance towards the common case at cost to the rare case. To strike a good bargain and find a good trade, you want to make the common case faster even if it makes the rare case a little bit slower. Beyond blatant inefficiencies, you generally can't make everything fast for everything, though you can achieve something that looks that way and seems super fast for everything to the user if you optimize the common cases, the critical paths, really well.

Memory Access and Representation

If we want to achieve more performance, we can separate position field from the class:
public Vector3[] m_positions;

This would be working towards an SoA (Structure of Arrays) approach and makes sense if the bulk of your critical loops spend time accessing the position of a player in sequential or random-access patterns but not, say, rotation. If both rotation and position are accessed frequently together in a random-access pattern, a struct storing both using an AoS (array of structures) approach might make the most sense. If they're both accessed predominantly in a sequential access pattern with no random-access, then the SoA might perform better not because it reduces cache misses (it would be close to on par with AoS) but because it can allow more efficient instruction selection by the optimizer when you can, say, load 8 SPFP positions fields at once into an YMM register without rotation fields interleaved with more homogeneous vertical arithmetic in processing loops.

A full-blown SoA approach might even separate your position components. Instead of:

xyzxyzxyzxyz...

It might favor this if the access patterns for critical paths are all sequential and process a large amount of data:

xxxxxxxx...
yyyyyyyy...
zzzzzzzz...

Like so:

float[] m_x;
float[] m_y;
float[] m_z;

That tends to be the most friendly memory layout for all kinds of SIMD instructions (allows you or the optimizer to use SIMD in a way that looks identical to scalar code, only it's being applied to 4+ fields at one time), though you generally want sequential access patterns for this kind of layout. If it's random-access, you might end up with almost three times the cache misses.

As for what you choose, first you have to figure out how you're going to be accessing the data in your most critical loops to understand how to design and represent it very effectively. And you generally have to make some exchanges that will slow down the rare case in favor of the common case, since if you go with an SoA design, you might still have some places in the system that would benefit more from AoS, so you're speeding up the common case using SoA if your critical paths are sequential while slowing down the rare case if your non-critical paths use random access patterns. There's a lot to take in and compromises to be made to come up with the most efficient solutions, and naturally it helps to measure, but also to design things efficiently upfront, you have to think about memory access patterns as well. A good memory layout for one access pattern isn't necessarily good for another.

If in doubt, I'd favor the AoS until you hit a hotspot since it's generally easier to maintain than parallel arrays. Then you might selectively apply SoA optimizations in hindsight. The key is to find the breathing room to do that, which you will find if you design coarser objects like Image, not Pixel, like Humans, not Human, like ParticleSystem, not Particle. If you design little teeny objects you use everywhere, you might find yourself trapped into a sub-optimal representation you can't change without breaking everything.

Finally I do not know that it is a standard approach or you avoid it to separate some fields from a class to get better performance [...]

Actually it's very common and is at least discussed fairly widely (at least it's not too esoteric knowledge) in fields like computer graphics and gaming when using languages that give you very explicit control over memory layouts like C++. However, the techniques are just as applicable even to languages using GC, since those languages can still give you plenty enough control over memory layouts in ways that matter provided they at least give you something like a struct which can be stored contiguously in an array.

All of this stuff about efficient memory access patterns relates predominantly to contiguity and spatial locality because we're dealing with loops and ideally when we load some data into a cache line, it covers not only data for that one element we're interested in but also the next one, and the next one, and so forth. We want as much relevant data to be there as possible and as little irrelevant data to be there as possible. Most of it becomes pointless (except for temporal locality) if nothing is contiguously stored since we'd be loading irrelevant data all over the place every time we load one element, but just about every language out there gives you some ways to store data in a purely contiguous fashion, even if you can't use objects for that.

I've actually seen a small interactive path tracer written in Java which rivaled the speed of any equally small interactive one written in C or C++. The main thing was that it avoided using objects for critical parts involving BVH traversal and ray/triangle intersection. There it used big arrays of floats and integers, but everything else used OOP. If you apply these types of optimizations discreetly in such languages, then they can start to give extremely impressive results from a performance standpoint without getting bogged down with maintenance issues.

Put aside OO design pattern to achieve better performance in strategic games?

Answers (2)

Related Questions