Reputation: 3260
I am working with a very big matrix X
(say, 1,000-by-1,000,000). My algorithm goes like following:
X
one by one, based on some filtering rules, to identify only a subset of columns that are needed. Denote the subset of indices of columns be S
. Its size depends on the filter, so is unknown before computation and will change if the filtering rules are different.S
, do some computation with a column x_i
if i
is in S
. This step needs to be parallelized with openMP
.I am wondering what the best way is to implement this procedure in C++. Here are two ways I can think of:
(a) Use a 0-1 array (with length 1,000,000) to indicate needed columns for Step 1 above; then in Step 2 loop over 1 to 1,000,000, use if-else to check indicator and do computation if indicator is 1 for that column;
(b) Use std::vector
for S
and push_back
the column index if identified as needed; then only loop over S
, each time extract column index from S
and then do computation. (I thought about using this way, but it's said push_back
is expensive if just storing integers.)
Since my algorithm is very time-consuming, I assume a little time saving in the basic step would mean a lot overall. So my question is, should I try (a) or (b) or other even better way for better performance (and for working with openMP
)?
Any suggestions/comments for achieving better speedup are very appreciated. Thank you very much!
Upvotes: 0
Views: 91
Reputation: 21136
(Based on my gut feeling) I'd probably go for pushing back into a vector, but the answer is quite simple: Measure both methods (they are both trivial to implement). Most likely you won't see a noticeable difference.
Upvotes: 0
Reputation: 7837
I think you will find std::vector
easier to use. Regarding push_back
, the cost is when the vector reallocates (and maybe copies) the data. To avoid that (if it matters), you could set vector::capacity
to 1,000,000. Your vector is then 8 MB, insignificant compared to your problem size. It's only 1 order magnitude bigger than a bitmap would be, and a lot simpler to deal with: If we call your vector S
and the nth interesting column i
, then your column access is just x[S[i]]
.
Upvotes: 1
Reputation: 8945
To me, it seems that "step #1 really does not matter much." (At the end of the day, you're going to wind up with: "a set of columns, however represented.")
To me, what's really going to matter is: "just what's gonna happen when you unleash ("parallelized ...") step #2.
"An array of 'ones and zeros,'" however large, should be fairly simple for parallelization, while a more-'advanced' data structure might well, in this case, "just get in the way."
"One thousand mega-bits, these days?" Sure. Done. No problem. ("And if not, a simple array of bit-sets.") However-many simultaneously executing entities should be able to navigate such a data structure, in parallel, with a minimum of conflict . . . Therefore, to my gut, "big bit-sets win."
Upvotes: 1