Reputation: 475
I've written an algorithm, designed to simulate a data produced by an experiment, and then perform a "coincidence search" on that data (more on that in a moment...). The data in question is a vector<vector<double> >
, with elements picked from a Gaussian distribution (more-or-less, random numbers). Each "column" represents a "data stream", and each row an instant in time. The "location" of each element in the "array" must be preserved.
The Algorithm:
The algorithm is designed to perform the following task:
Iterate simultaneously through all n
columns (data streams), and count the number of times at least c
unique columns have an element with an absolute value greater than some threshold, such that the elements lie in a specified time interval (i.e. a certain number of rows).
When this occurs, we add one to a counter, and then jump forward in time (row-wise) by some specified amount. We start over again, until we've traversed the entire "array". Finally, we return the value of the counter (the "number of coincidences").
My solution:
I give the code first, then step through it piece-by-piece and explain it's operation (and, also hopefully clarifying some details):
size_t numOfCoincidences(vector<vector<double>> array, double value_threshold, size_t num_columns){
set<size_t> cache;
size_t coincidence_counter = 0, time_counter = 0;
auto exceeds_threshold = [&](double element){ return fabs(element) >= value_threshold; };
for(auto row_itr = begin(array); row_itr != end(row_itr); ++row_itr){
auto &row = *row_itr;
auto coln_itr = std::find_if(execution::par_unseq, begin(row), end(row), exceeds_threshold);
while(coln_itr != row.end()){
cache.insert(distance(begin(row), coln_itr));
coln_itr = std::find_if(next(coln_itr), end(row), exceeds_threshold);
}
if(size(cache) >= num_columns){
++coincidence_counter;
cache.clear();
if(distance(row_ctr, end(waveform)) > (4004000 - time_counter)){
advance(row_ctr, ((4004000 - time_counter)));
} else {
return coincidence_counter;
}
}
if(time_counter == time_threshold){
row_itr -= (time_counter + 1);
cache.clear();
}
++time_counter;
}
if(cache.size() == 0) time_counter = 0;
return(coincidence_counter);
}
I iterate through the data (vector<vector<double> > array
) row-wise:
for(auto row_itr = begin(array); row_itr != end(row_itr); ++row_itr)
For each row, I use std::find_if
to get every element exceeding the value threshold (value_threshold
):
auto coln_itr = std::find_if(execution::par_unseq, begin(row), end(row), exceeds_threshold);
while(coln_itr != row.end()){
cache.insert(distance(begin(row), coln_itr));
coln_itr = std::find_if(next(coln_itr), end(row), exceeds_threshold);
}
What I'm after is the columnar index, so I use std::distance
to get that and store it in an std::set
, cache
. I choose std::set
here because I'm interested in counting the number of unique columns that have a value exceeding value_threshold
, within some time (i.e., row) interval. By using std::set
, I can just dump the columnar index of every such value, and duplicates are "automatically removed". Then, later, I can simply check the size of cache
and, if it's greater than or equal to the specified number (num_columns
), I've found a "coincidence".
After getting the columnar index of every value exceeding value_threshold
, I check the size of cache
to see if I've found enough unique columns. If I have, I add one to the coincidence_counter
, I clear the cache
, then jump forward in "time" (i.e., rows) by some specified amount (here, 4004000 - time_counter
). Notice that I subtract time_counter
, which keeps track of the "time" (# of rows) from the first found value(s) exceeding value_threshold
. I want to jump forward in time from that starting point.
if(size(cache) >= num_columns){
++coincidence_counter;
cache.clear();
if(distance(row_ctr, end(waveform)) > (4004000 - time_counter)){
advance(row_ctr, ((4004000 - time_counter)));
} else {
return coincidence_counter;
}
}
Finally, I check time_counter
. Remember that the num_columns
unique columns must be within some time (i.e., row) threshold of one another. I start that time count from the first found value exceeding value_threshold
. If I've exceeded the time threshold, what I want to do is empty cache()
, and start over using the second-found value exceeding the value threshold (if there is one) as the new first-found value, and hopefully find a coincidence using that as the starting point.
Instead of keeping track of the time (i.e., row index) of each found value, I simply start over at one after the first-found value (i.e., time_counter + 1
).
if(time_counter == time_threshold){
row_itr -= (time_counter + 1);
cache.clear();
}
I also add one to time_counter
with each loop, and set it equal to 0
if cache
has size 0
(I want to start counting time (i.e., rows) from the first-found value exceeding value_threshold
).
Attempted Optimizations:
I'm not sure if these have helped, hurt, or otherwise, however here's what I've tried (with little success)
I've replaced all int
and unsigned int
with size_t
. I understand that this may be ever so slightly faster, and these values should never be less than 0
anyhow.
I've also used execution::par_unseq
with std::find_if
. I'm not sure how much this helps. The "array" typically has about 16-20
columns, but an exceptionally large number of rows (on the order of 50000000
or more). Since std::find_if
is "scanning" individual rows, which only have tens of elements, at most, perhaps parallelization isn't helping much.
Goals:
Unfortunately, the algorithm takes an exceptionally long time to run. My utmost priority is speed. If possible, I'd like to cut the execution time in half.
Some things to keep in mind:
The "array" is typically on the order of ~20
columns by ~50000000
rows (sometimes longer). It has very few 0's
, and cannot be re-arranged (the order of the "rows", and elements in each row, matters). It takes up (unsurprisingly) a ton of memory, and my machine is therefore quite resource constrained.
I'm also running this as interpreted C++
, in cling
. In my work, I've never used compiled C++
much. I've tried compiling, however it hasn't helped much. I've also tried playing with compiler optimization flags.
What can be done to cut execution time (at the expense of virtually anything else?)
Please, let me know if I can offer any additional information to assist in answering the question.
Upvotes: 5
Views: 208
Reputation: 65478
This code seems like it might be memory bandwidth bound regardless, but I'd try removing the fancy algorithm stuff in favor of a windowed count. Untested C++:
#include <algorithm>
#include <cmath>
#include <vector>
using std::fabs;
using std::size_t;
using std::vector;
size_t NumCoincidences(const vector<vector<double>> &array,
double value_threshold, size_t num_columns) {
static constexpr size_t kWindowSize = 4004000;
const auto exceeds_threshold = [&](double x) {
return fabs(x) >= value_threshold;
};
size_t start = 0;
std::vector<size_t> num_exceeds_in_window(array[0].size());
size_t num_coincidences = 0;
for (size_t i = 0; i < array.size(); i++) {
const auto &row = array[i];
for (size_t j = 0; j < row.size(); j++) {
num_exceeds_in_window[j] += exceeds_threshold(row[j]) ? 1 : 0;
}
if (i >= start + kWindowSize) {
const auto &row = array[i - kWindowSize];
for (size_t j = 0; j < row.size(); j++) {
num_exceeds_in_window[j] -= exceeds_threshold(row[j]) ? 1 : 0;
}
}
size_t total_exceeds_in_window = 0;
for (size_t n : num_exceeds_in_window) {
total_exceeds_in_window += n > 0 ? 1 : 0;
}
if (total_exceeds_in_window >= num_columns) {
start = i + 1;
std::fill(num_exceeds_in_window.begin(), num_exceeds_in_window.end(), 0);
num_coincidences++;
}
}
return num_coincidences;
}
Upvotes: 1