Evok
Evok

Reputation: 154

Using omp parallel for in multiplication algorithm (BigInt multiplication)

For educational purpose I'm developing c++ library for operating with large numbers represented as vectors of chars (vector<char>).

Here is algorithm that I am using for multiplication:

string multiplicationInner(CharVector a, CharVector b) {
  reverse(a.begin(), a.end());
  reverse(b.begin(), b.end());

  IntVector stack(a.size() + b.size() + 1);

  int i, j;
  for (i = 0; i < a.size(); i++)
    for (j = 0; j < b.size(); j++)
      stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
 

  for (int i = 0; i < stack.size(); i++) {
    int num = stack[i] % 10;
    int move = stack[i] / 10;
    stack[i] = num;

    if (stack[i + 1])
      stack[i + 1] += move;
    else if (move)
      stack[i + 1] = move;
  }

  CharVector stackChar = intVectorToCharVector(&stack);
  deleteZerosAtEnd(&stackChar);
  reverse(stackChar.begin(), stackChar.end());

  return charVectorToString(&stackChar);
};

This function is called billion times in my program, so I would like to implement #pragma omp parallel for in it.

My question is: How can i parallelize first cycle?

This is what I have tried:

int i, j;
  #pragma omp parallel for
  for (i = 0; i < a.size(); i++) {
    for (j = 0; j < b.size(); j++)
      stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
  }

Algorithm stops working properly. Advice needed.

Edit: This variant works, but (with omp parallel for) benchmark shows it is 15x-20x slower than without it. (CPU: M1 Pro, 8 cores)

#pragma omp parallel for schedule(dynamic)
  for (int k = 0; k < a.size() + b.size(); k++) { 
    for (int i = 0; i < a.size(); i++) {
      int j = k - i;
      if (j >= 0 && j < b.size()) {
        stack[k] += charToInt(a[i]) * charToInt(b[j]);
      }
    }
  }

This is part of my program, where multiplication is called most often. (Miller-Rabin test)

BigInt modularExponentiation(BigInt base, BigInt exponent, BigInt mod) {
  BigInt x = B_ONE; // 1
  BigInt y = base;

  while (exponent > B_ZERO) { // while exponent > 0
    if (isOdd(exponent))
      x = (x * y) % mod;
    y = (y * y) % mod;
    exponent /= B_TWO; // exponent /= 2
  }

  return (x % mod);
};

bool isMillerRabinTestOk(BigInt candidate) {
  if (candidate < B_TWO)
    return false;

  if (candidate != B_TWO && isEven(candidate))
    return false;

  BigInt canditateMinusOne = candidate - B_ONE;
  BigInt s = canditateMinusOne;
  while (isEven(s))
    s /= B_TWO;

  for (int i = 0; i < MILLER_RABIN_TEST_ITERATIONS; i++) {
    BigInt a = BigInt(rand()) % canditateMinusOne + B_ONE;
    BigInt temp = s;
    BigInt mod = modularExponentiation(a, temp, candidate);

    while (temp != canditateMinusOne && mod != B_ONE && mod != canditateMinusOne) {
      mod = (mod * mod) % candidate;
      temp *= B_TWO;
    }

    if (mod != canditateMinusOne && isEven(temp))
      return false;
  }

  return true;
};

Upvotes: 0

Views: 159

Answers (1)

Victor Eijkhout
Victor Eijkhout

Reputation: 5794

Your loops do not have the proper structure for parallelization. However, you can transform them:

for (k=0; k<a.size()+b.size(); k++) { 
  for (i=0; i<a.size(); i++) {
    j=k-i;
    stack[k] += a[i] * b[j];
}

Now the outer loop has no conflicts. Look at this as a "coordinate transformation": you're still traversing the same i/j row/column space, but now in new coordinates: k/i stands for diagonal/row.

Btw, this code is a little metaphorical. Check your loop bounds, and use the right multiplication. I'm just indicating the principle here.

Upvotes: 2

Related Questions