Reputation: 377
I am maintaing the code for a Go project that reads and writes a lot of data and that has done so successfully for some time. Recently, I made a change: a CSV file with about 2 million records is loaded in a map with struct values at the beginning of the program. This map is only used in part B, but first part A is executed. And this first part already runs noticeably slower than before (processing time is quadrupled). That is very strange since that part of the logic did not change. I have spent a week trying to explain how this can happen. Here are the steps I have taken (when I mention performance, I always refer to part A, which does not include the time to load the data in memory and actually has nothing to do with it):
Here I have plotted the metrics with and without the data in memory:
What could cause this effect or how do I find it out?
Upvotes: 1
Views: 221
Reputation: 76395
So if I get this right, your flow looks something like this:
Why read the data before you need it, would be the first question, but that's perhaps besides the point.
What is likely is that 2 million structs in a map are routinely being accessed by the garbage collector, actually. Depending on what value GOGC
has, the pacer component of the garbage collector is likely to kick in more often as the amount of memory allocated increases. Because this map is set aside for later use, there's nothing for the GC to do, but it's taking up cycles in checking the data regardless. There's a number of things you could do to verify, and account for this behaviour - all of these things should can help you rule out/confirm whether or not garbage collection is slowing you down.
debug.SetGCPercent(-1)
)sync.Pool
. This is a type designed for you to keep stuff you'll manage manually, and move outside of regular GC cycles.Upvotes: 1