MarKol4
MarKol4

Reputation: 341

How does CLR GC compare to latest ZGC and Shenandoah GC on JVM?

In recent years a lot of features were added to C# (which is no 1 player in .NET world) to reduce GC pressure. It is indisputable that all these features enable us to build better and more efficient applications. But regardless of language and VM (CLR, JVM) features being added over time a performant and non-blocking GC is the key performance factor for managed applications.

Recently in JVM world two new GC's have emerged and it seems they deliver remarkable metrics. There are sources (including authors) that provide benchmarks and technical insights about those GCs. We can learn that maximum STW (stop the world) interval "is promised" to be no longer then 10 ms and typically oscillates below 1 ms on average regardless of the heap size. There are also tests which show the new GCs overhead is well balanced and does not negatively impact application throughput whereas at the same time drastically reduce (by factor 10 or more) STW pauses.

On the other hand there is very little information about CLR GCs. Are there any up to date sources to see how CLR GCs (4.8, Core 3.1, .NET 5) compare to latest JVM achievements? I could find some older sources which discussed CLR GC vs G1. But today G1 is no match for ZGC/Shenandoah and old sources do not show the reality as it is today. Considering there are no newer sources we could conclude that CLR GC metrics did not improve significantly since then. But it would look like a real problem for .NET platform in 2020 because average STW around 20-30 ms with occasional jumps to 300+ ms look really bad when compared to 1ms average and 10ms maximum (as the GCs makers claim and test seem to confirm) pauses on JVM.

I must say it keeps me a little worried because there is a whole bunch of applications where GC pauses matter a lot. In fact they are one of key factors that decide whether specified technology (e.g. .NET, JVM, native, ...) should be considered viable for a task or purpose. It looks like latest GCs on JVM open new areas for Java and other JVM languages/technologies. Areas where we do not allow possibility that application will stop for 500ms or so because GC has to do its work whereas ~10ms maximum with ~1ms average is good enough.

What is the truth today? How does CLR GC compare to latest JVM GCs? Are there any guarantees regarding STW pauses on CLR (it looks like JVM is going that direction)?

Upvotes: 9

Views: 4681

Answers (2)

anon
anon

Reputation: 29

Due to the inherent design differences, it's pretty likely that the JVM's latest Garbage Collectors will drastically outperform the CLR's. The main reason behind this is that the CLR by default handles memory slightly better, something which the JVM does suffer a little from. As a result, there hasn't really been much pressure to make the GC in the CLR blazing fast, since a simple design suffices and does pretty well. By contrast the JVM does have a need for better GCs, and in fact might have the most advanced ones to date. Although this likely won't make the JVM orders of magnitude faster than the CLR, if you pit their GC's against each other the CLR will stand no chance at all, especially with newer JVM GCs like Z and Shenandoah.

Upvotes: 2

Kevin Gosse
Kevin Gosse

Reputation: 39027

That's a big subject, but if I had to summarize:

  • shenandoah, zgc, and so on are low-latency garbage collectors: they sacrifice throughput to ensure a very low pause time. They are fit for some kind of applications (basically, anything that has low-latency constraints) but not for others (to take an extreme example, for a batch you don't care about latency at all and want to maximize throughput, which makes those GCs a bad choice)
  • To this day, .NET has no low-latency GC. I heard there are some long-term plans to implement one, but I doubt we'll see anything before at least 2 more years
  • .NET GC has a very different approach from Java GCs. Java GCs can be tuned very finely, at the expense of a large complexity. .NET GC aims at "just working", there are few settings but they are easy to understand and to leverage. This is less and less true, as .NET Core has added a bunch of configuration knobs, for instance to tune the gen 0 budget.

Areas where we do not allow possibility that application will stop for 500ms or so because GC has to do its work whereas ~10ms maximum with ~1ms average is good enough.

From my non-representative experience, you can expect anywhere between 5 to 15 ms of GC pause time per gen 0 collection when using .NET. If you're aiming for ~1ms, you probably need to disable the GC altogether. I know that some companies are doing high-frequency trading in .NET so it shows it's possible. But only because they can afford to restart the servers outside of market hours. If you need a sustained ~1 ms pause time, then .NET is not ready for that.

Upvotes: 11

Related Questions