Ricky
Ricky

Reputation: 91

TPL Dataflow Batchblock Duplicate elements

My DataFlow pipeline starts with a BatchBlock and several Tasks are posting items into this BatchBlock. Now, this BatchBlockpropagates data to the next block depending on a Timer with the help of the TriggerBatch() method.

In this case, you can assume that none of the batches are of the (very high) batch size provided during the creation of the BatchBlock i.e. each triggered batch could be of a different size.

Just before triggering the BatchBlock I would like to remove all duplicate items present in the batch that is about to be propagated to the next block in the pipeline. Is there a way I can do that?

Upvotes: 3

Views: 683

Answers (1)

i3arnon
i3arnon

Reputation: 116548

You can't add or remove items that are stored inside blocks.

However, you can add a TransformBlock after the BatchBlock that removes duplicates for the current batch and moves the batch forward. Keep in mind that it means your batches may be smaller.

Assuming equality members are implemented correctly it can look like this:

var transformBlock = new TransformBlock<int[], IEnumerable<int>>(_ => new HashSet<int>(_));

Upvotes: 4

Related Questions