kainwen
kainwen

Reputation: 376

Make Sparse Matrix Multiply Fast

The code is written using C++11. Each Process got tow Matrix Data(Sparse). The test data can be downloaded from enter link description here

Test data contains 2 file : a0 (Sparse Matrix 0) and a1 (Sparse Matrix 1). Each line in file is "i j v", means the sparse matrix Row i, Column j has the value v. i,j,v are all integers.

Use c++11 unordered_map as the sparse matrix's data structure.

unordered_map<int, unordered_map<int, double> > matrix1 ;
matrix1[i][j] = v ; //means at row i column j of matrix1 is value v;

The following code took about 2 minutes. The compile command is g++ -O2 -std=c++11 ./matmult.cpp.

g++ version is 4.8.1, Opensuse 13.1. My computer's info : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz, 4G memory.

#include <iostream>
#include <fstream>
#include <unordered_map>
#include <vector>
#include <thread>

using namespace std;

void load(string fn, unordered_map<int,unordered_map<int, double> > &m) {
  ifstream input ;
  input.open(fn);
  int i, j ; double v;
  while (input >> i >> j >> v)  {
    m[i][j] = v;
  }
}

unordered_map<int,unordered_map<int, double> > m1;
unordered_map<int,unordered_map<int, double> > m2;
//vector<vector<int> > keys(BLK_SIZE);

int main() {
  load("./a0",m1);
  load("./a1",m2);

  for (auto r1 : m1) {
    for (auto r2 : m2) {
      double sim = 0.0 ;
      for (auto c1 : r1.second) {
        auto f = r2.second.find(c1.first);
        if (f != r2.second.end()) {
           sim += (f->second) * (c1.second) ;
        }
      }
   }
  }
  return 0;
}

The code above is too slow. How can I make it run faster? I use multithread. The new code is following, compile command is g++ -O2 -std=c++11 -pthread ./test.cpp. And it took about 1 minute. I want it to be faster.

How Can I make the task faster? Thank you!

#include <iostream>
#include <fstream>
#include <unordered_map>
#include <vector>
#include <thread>

#define BLK_SIZE 8

using namespace std;

void load(string fn, unordered_map<int,unordered_map<int, double> > &m) {
  ifstream input ;
  input.open(fn);
  int i, j ; double v;
  while (input >> i >> j >> v)  {
    m[i][j] = v;
  }
}

unordered_map<int,unordered_map<int, double> > m1;
unordered_map<int,unordered_map<int, double> > m2;
vector<vector<int> > keys(BLK_SIZE);

void thread_sim(int blk_id) {
  for (auto row1_id : keys[blk_id]) {
    auto r1 = m1[row1_id];
    for (auto r2p : m2) {
      double sim = 0.0;
      for (auto col1 : r1) {
        auto f = r2p.second.find(col1.first);
        if (f != r2p.second.end()) {
          sim += (f->second) * col1.second ;
        }
      }
    }
  }
}

int main() {

  load("./a0",m1);
  load("./a1",m2);

  int df = BLK_SIZE - (m1.size() % BLK_SIZE);
  int blk_rows = (m1.size() + df) / (BLK_SIZE - 1);
  int curr_thread_id  = 0;
  int index = 0;
  for (auto k : m1) {
    keys[curr_thread_id].push_back(k.first);
    index++;
    if (index==blk_rows) {
      index = 0;
      curr_thread_id++;
    }
  }
  cout << "ok" << endl;
  std::thread t[BLK_SIZE];
  for (int i = 0 ; i < BLK_SIZE ; ++i){
    t[i] = std::thread(thread_sim,i);
  }
  for (int i = 0; i< BLK_SIZE; ++i)
    t[i].join();

  return 0 ;
}

Upvotes: 1

Views: 1568

Answers (2)

Skizz
Skizz

Reputation: 71070

You haven't specified the time you expect your example to run in or the platform you hope to run on. These are important design contraints in this example.

There are several areas that I can think of for improving the efficeny of this:-

  1. Improve the way the data is stored
  2. Improve the multithreading
  3. Improve the algorithm

The first point is geared toward the way the system stores the sparse arrays and the interfaces to enable the data to be read. Nested unordered_maps are a good option when speed isn't important but there may be more specific data structures available that are geared toward this problem. At best you may find a library that provides a better way to store the data than nested maps, at worst you may have to come up with something yourself.

The second point refers to the way the multithreading is supported in the language. The original spec for the multithreading system were meant to be platform independant and might miss out handy features some systems might have. Decide what system you want to target and use the OSs threading system. You'll have more control over the way the threading works, possibly reduce the overhead but will lose out on the cross platform support.

The third point will take a bit of work. Is the way you're multiplying the matricies really the most efficent way given the nature of the data. I'm no expert on these things but it is something to consider but it will take a bit of effort.

Lastly, you can always be very specific about the platform you're running on and head into the world of assembly programming. Modern CPUs are complicated beasts. They can sometimes perform operations in parallel. For example, you may be able to do SIMD operations or do parallel integer and floating point operations. Doing this does require a deep understanding of what's going on and there are useful tools to help you out. Intel did have a tool called VTune (it may be something else now) that would analyse code and highlight potential bottlenecks. Ultimately, you'll be wanting to eliminate areas of the algorithm where the CPU is idle waiting for something to happen (like waiting for data from RAM) either by finding something else for the CPU to do or improving the algorithm (or both).

Ultimately, in order to improve the overall speed, you'll need to know what is slowing it down. This generally means knowing how to analyse your code and understand the results. Profilers are the general tool for this but there are platform specific tools available as well.

I know this isn't quite what you want but making code fast is really hard and very time consuming.

Upvotes: 1

janneb
janneb

Reputation: 37188

Most times when working with sparse matrices one uses more efficient representations than the nested maps you have. Typical choices are Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC). See https://en.wikipedia.org/wiki/Sparse_matrix for details.

Upvotes: 1

Related Questions