Reputation: 282
I am trying to reduce the memory required to compute reduce_by_key for my use case. I have a relatively small number of unique keys (around 100-150) compared to the number of values (around 16 million). The reduce by key example shows that the device_vectors allocated to contain the result is of the same size as that of the inputs. Is it always necessary to do so? Is it possible to only allocate as much memory as necessary to contain the correct output?
Upvotes: 2
Views: 604
Reputation: 13013
Output size of reduction depends on input data, and this value is not typically known before reduction. However, depending on your problem, sometimes you know this size.
Reasonable implementations would require only at least number of key spans elements for the output. And thrust::reduce_by_key
seems to be included in this list.
Output size is hard to predict
keys 2 2 2 3 3 2 1
values 1 1 1 1 1 1 1
|----------|------|----|---|
4 spans
reduced 3 2 1 1
Output size is equal to number of unique keys
keys 1 2 2 2 2 3 3
values 1 1 1 1 1 1 1
|--|---------------|------|
3 spans
reduced 1 4 2
Output size is equal to input size
keys 1 2 3 1 2 3 1
values 1 1 1 1 1 1 1
|--|---|---|---|---|---|--|
7 spans
reduced 1 1 1 1 1 1 1
Another thing, is that if you don't need keys to be output, you could discard them with thrust::discard_iterator
and save some extra space, as described here.
Upvotes: 5