10GeV
10GeV

Reputation: 475

Optimizing this "coincidence search" algorithm, for speed

I've written an algorithm, designed to simulate a data produced by an experiment, and then perform a "coincidence search" on that data (more on that in a moment...). The data in question is a vector<vector<double> >, with elements picked from a Gaussian distribution (more-or-less, random numbers). Each "column" represents a "data stream", and each row an instant in time. The "location" of each element in the "array" must be preserved.


The Algorithm:

The algorithm is designed to perform the following task:

Iterate simultaneously through all n columns (data streams), and count the number of times at least c unique columns have an element with an absolute value greater than some threshold, such that the elements lie in a specified time interval (i.e. a certain number of rows).

When this occurs, we add one to a counter, and then jump forward in time (row-wise) by some specified amount. We start over again, until we've traversed the entire "array". Finally, we return the value of the counter (the "number of coincidences").


My solution:

I give the code first, then step through it piece-by-piece and explain it's operation (and, also hopefully clarifying some details):

size_t numOfCoincidences(vector<vector<double>> array, double value_threshold, size_t num_columns){

    set<size_t> cache;
    size_t coincidence_counter = 0, time_counter = 0;

    auto exceeds_threshold = [&](double element){ return fabs(element) >= value_threshold; };

    for(auto row_itr = begin(array); row_itr != end(row_itr); ++row_itr){

        auto &row = *row_itr;

        auto coln_itr = std::find_if(execution::par_unseq, begin(row), end(row), exceeds_threshold);
        while(coln_itr != row.end()){
            cache.insert(distance(begin(row), coln_itr));
            coln_itr = std::find_if(next(coln_itr), end(row), exceeds_threshold);
        }

        if(size(cache) >= num_columns){

            ++coincidence_counter;
            cache.clear();

            if(distance(row_ctr, end(waveform)) > (4004000 - time_counter)){
                advance(row_ctr, ((4004000 - time_counter)));
            } else {
                return coincidence_counter;
            }

        }


        if(time_counter == time_threshold){
            row_itr -= (time_counter + 1);
            cache.clear();
        }


        ++time_counter;


    }

    if(cache.size() == 0) time_counter = 0;

    return(coincidence_counter);

}

How it works...

I iterate through the data (vector<vector<double> > array) row-wise:

for(auto row_itr = begin(array); row_itr != end(row_itr); ++row_itr)

For each row, I use std::find_if to get every element exceeding the value threshold (value_threshold):

        auto coln_itr = std::find_if(execution::par_unseq, begin(row), end(row), exceeds_threshold);
        while(coln_itr != row.end()){
            cache.insert(distance(begin(row), coln_itr));
            coln_itr = std::find_if(next(coln_itr), end(row), exceeds_threshold);
        }

What I'm after is the columnar index, so I use std::distance to get that and store it in an std::set, cache. I choose std::set here because I'm interested in counting the number of unique columns that have a value exceeding value_threshold, within some time (i.e., row) interval. By using std::set, I can just dump the columnar index of every such value, and duplicates are "automatically removed". Then, later, I can simply check the size of cache and, if it's greater than or equal to the specified number (num_columns), I've found a "coincidence".

After getting the columnar index of every value exceeding value_threshold, I check the size of cache to see if I've found enough unique columns. If I have, I add one to the coincidence_counter, I clear the cache, then jump forward in "time" (i.e., rows) by some specified amount (here, 4004000 - time_counter). Notice that I subtract time_counter, which keeps track of the "time" (# of rows) from the first found value(s) exceeding value_threshold. I want to jump forward in time from that starting point.

        if(size(cache) >= num_columns){

            ++coincidence_counter;
            cache.clear();

            if(distance(row_ctr, end(waveform)) > (4004000 - time_counter)){
                advance(row_ctr, ((4004000 - time_counter)));
            } else {
                return coincidence_counter;
            }

        }

Finally, I check time_counter. Remember that the num_columns unique columns must be within some time (i.e., row) threshold of one another. I start that time count from the first found value exceeding value_threshold. If I've exceeded the time threshold, what I want to do is empty cache(), and start over using the second-found value exceeding the value threshold (if there is one) as the new first-found value, and hopefully find a coincidence using that as the starting point.

Instead of keeping track of the time (i.e., row index) of each found value, I simply start over at one after the first-found value (i.e., time_counter + 1).

        if(time_counter == time_threshold){
            row_itr -= (time_counter + 1);
            cache.clear();
        }

I also add one to time_counter with each loop, and set it equal to 0 if cache has size 0 (I want to start counting time (i.e., rows) from the first-found value exceeding value_threshold).


Attempted Optimizations:

I'm not sure if these have helped, hurt, or otherwise, however here's what I've tried (with little success)

I've replaced all int and unsigned int with size_t. I understand that this may be ever so slightly faster, and these values should never be less than 0 anyhow.

I've also used execution::par_unseq with std::find_if. I'm not sure how much this helps. The "array" typically has about 16-20 columns, but an exceptionally large number of rows (on the order of 50000000 or more). Since std::find_if is "scanning" individual rows, which only have tens of elements, at most, perhaps parallelization isn't helping much.


Goals:

Unfortunately, the algorithm takes an exceptionally long time to run. My utmost priority is speed. If possible, I'd like to cut the execution time in half.

Some things to keep in mind: The "array" is typically on the order of ~20 columns by ~50000000 rows (sometimes longer). It has very few 0's, and cannot be re-arranged (the order of the "rows", and elements in each row, matters). It takes up (unsurprisingly) a ton of memory, and my machine is therefore quite resource constrained.

I'm also running this as interpreted C++, in cling. In my work, I've never used compiled C++ much. I've tried compiling, however it hasn't helped much. I've also tried playing with compiler optimization flags.


What can be done to cut execution time (at the expense of virtually anything else?)

Please, let me know if I can offer any additional information to assist in answering the question.

Upvotes: 5

Views: 208

Answers (1)

David Eisenstat
David Eisenstat

Reputation: 65478

This code seems like it might be memory bandwidth bound regardless, but I'd try removing the fancy algorithm stuff in favor of a windowed count. Untested C++:

#include <algorithm>
#include <cmath>
#include <vector>

using std::fabs;
using std::size_t;
using std::vector;

size_t NumCoincidences(const vector<vector<double>> &array,
                       double value_threshold, size_t num_columns) {
  static constexpr size_t kWindowSize = 4004000;
  const auto exceeds_threshold = [&](double x) {
    return fabs(x) >= value_threshold;
  };
  size_t start = 0;
  std::vector<size_t> num_exceeds_in_window(array[0].size());
  size_t num_coincidences = 0;
  for (size_t i = 0; i < array.size(); i++) {
    const auto &row = array[i];
    for (size_t j = 0; j < row.size(); j++) {
      num_exceeds_in_window[j] += exceeds_threshold(row[j]) ? 1 : 0;
    }
    if (i >= start + kWindowSize) {
      const auto &row = array[i - kWindowSize];
      for (size_t j = 0; j < row.size(); j++) {
        num_exceeds_in_window[j] -= exceeds_threshold(row[j]) ? 1 : 0;
      }
    }
    size_t total_exceeds_in_window = 0;
    for (size_t n : num_exceeds_in_window) {
      total_exceeds_in_window += n > 0 ? 1 : 0;
    }
    if (total_exceeds_in_window >= num_columns) {
      start = i + 1;
      std::fill(num_exceeds_in_window.begin(), num_exceeds_in_window.end(), 0);
      num_coincidences++;
    }
  }
  return num_coincidences;
}

Upvotes: 1

Related Questions