Reputation: 154
For educational purpose I'm developing c++ library for operating with large numbers represented as vectors of chars (vector<char>
).
Here is algorithm that I am using for multiplication:
string multiplicationInner(CharVector a, CharVector b) {
reverse(a.begin(), a.end());
reverse(b.begin(), b.end());
IntVector stack(a.size() + b.size() + 1);
int i, j;
for (i = 0; i < a.size(); i++)
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
for (int i = 0; i < stack.size(); i++) {
int num = stack[i] % 10;
int move = stack[i] / 10;
stack[i] = num;
if (stack[i + 1])
stack[i + 1] += move;
else if (move)
stack[i + 1] = move;
}
CharVector stackChar = intVectorToCharVector(&stack);
deleteZerosAtEnd(&stackChar);
reverse(stackChar.begin(), stackChar.end());
return charVectorToString(&stackChar);
};
This function is called billion times in my program, so I would like to implement #pragma omp parallel for in it.
My question is: How can i parallelize first cycle?
This is what I have tried:
int i, j;
#pragma omp parallel for
for (i = 0; i < a.size(); i++) {
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
}
Algorithm stops working properly. Advice needed.
Edit: This variant works, but (with omp parallel for) benchmark shows it is 15x-20x slower than without it. (CPU: M1 Pro, 8 cores)
#pragma omp parallel for schedule(dynamic)
for (int k = 0; k < a.size() + b.size(); k++) {
for (int i = 0; i < a.size(); i++) {
int j = k - i;
if (j >= 0 && j < b.size()) {
stack[k] += charToInt(a[i]) * charToInt(b[j]);
}
}
}
This is part of my program, where multiplication is called most often. (Miller-Rabin test)
BigInt modularExponentiation(BigInt base, BigInt exponent, BigInt mod) {
BigInt x = B_ONE; // 1
BigInt y = base;
while (exponent > B_ZERO) { // while exponent > 0
if (isOdd(exponent))
x = (x * y) % mod;
y = (y * y) % mod;
exponent /= B_TWO; // exponent /= 2
}
return (x % mod);
};
bool isMillerRabinTestOk(BigInt candidate) {
if (candidate < B_TWO)
return false;
if (candidate != B_TWO && isEven(candidate))
return false;
BigInt canditateMinusOne = candidate - B_ONE;
BigInt s = canditateMinusOne;
while (isEven(s))
s /= B_TWO;
for (int i = 0; i < MILLER_RABIN_TEST_ITERATIONS; i++) {
BigInt a = BigInt(rand()) % canditateMinusOne + B_ONE;
BigInt temp = s;
BigInt mod = modularExponentiation(a, temp, candidate);
while (temp != canditateMinusOne && mod != B_ONE && mod != canditateMinusOne) {
mod = (mod * mod) % candidate;
temp *= B_TWO;
}
if (mod != canditateMinusOne && isEven(temp))
return false;
}
return true;
};
Upvotes: 0
Views: 159
Reputation: 5794
Your loops do not have the proper structure for parallelization. However, you can transform them:
for (k=0; k<a.size()+b.size(); k++) {
for (i=0; i<a.size(); i++) {
j=k-i;
stack[k] += a[i] * b[j];
}
Now the outer loop has no conflicts. Look at this as a "coordinate transformation": you're still traversing the same i/j row/column space, but now in new coordinates: k/i stands for diagonal/row.
Btw, this code is a little metaphorical. Check your loop bounds, and use the right multiplication. I'm just indicating the principle here.
Upvotes: 2