Reputation: 9269
I am updating a histogram which is represented using a simple integer array with 16 bins as below.
const int binSize = 4096;
int histogram[16];
unsigned short inData[1024]; // This is my input data. Short is 16 bits
for(int i = 0; i < 1024; ++i)
{
++histogram[inData[i] / binSize];
}
I run this operation very frequently , so this became a bottleneck because this loop is not parallelized by DSP as multiple bins cannot be updated at same time. How can I optimize this?
I am running this code on TI DSP C6000 series.
Upvotes: 1
Views: 269
Reputation: 63039
To give an example of what the comments mean:
#include <array>
#include <algorithm>
#include <boost/range/adaptor/transformed.hpp>
using Histogram = std::array<int, 16>;
Histogram from_short(short num)
{
Histogram result;
result[num / 4096] = 1;
return result;
}
Histogram add(const Histogram & lhs, const Histogram & rhs)
{
Histogram result;
for (size_t i = 0; i < 16; ++i) { result[i] = lhs[i] + rhs[i]; }
return result;
}
auto singles = indata | boost::adaptors::transformed(from_short);
Histogram histogram = std::reduce(begin(singles), end(singles), Histogram{}, add);
Another option:
std::sort(begin(indata), end(indata));
short * previous = begin(indata);
for (size_t i = 0; i < 15; ++i)
{
short * current = std::lower_bound(indata, 4096 * (i + 1));
histogram[i] = std::distance(previous, current);
previous = current;
}
histogram[16] = std::distance(previous, end(indata));
Upvotes: 1