Reputation: 169
I have a process that is working over a large dataset, processing records within a Parallel.ForEach
and then storing the results in a ConcurrentQueue<List<string>>
. So a record is processed, and each field in the record results in a string, which is then added to the List
. At the end of the record that List
is then Enqueued
, and further processing is done on the ConcurrentQueue
holding all the processed records.
After a couple hours of processing the set I have noticed that my CPU usage has gone from a new wave to staying pretty high, and the time to process a group of records starts to grow.
My assumption here is that the List
is filled to capacity and then copied into a new larger List
. As the size grows the CPU required to keep up with this capacity, initialization cycle grows. The dataset I'm working with is of indeterminate size, in that each record has a variable number of child records. The number of parent records is usually in the area of 500k.
So my first thought is to initialize the List
to the Count
of the parent records. The List
would still have to grow due to the child records, but it would at least have to grow fewer times. But is there some other collection alternative to List
that scales better? Or a different approach than my first instinct which seems better?
Upvotes: 0
Views: 52
Reputation: 3285
A ConcurrentQueue is implemented as a linked list and does not need to resize for capacity (unlike the regular Queue). So your problem will be elsewhere.
You might want to look into the amount of memory used and rate of garbage collection caused by cleaning up processed Lists.
Other tips:
Upvotes: 1