Reputation: 56
We all know how useful matrix transposition is, and writing a general algorithm for sequential use is no problem. However, I am having some trouble doing the same for multithreaded purposes, and have only gotten this small case for 4 x 4 to work properly.
My approach is to assign equal parts of a double[][] structure, in this case 2 x 2, to each out of four threads. In this case, that means starting positions of 0,0 & 0,2 & 2,0 & 2,2. This is passed in with "kol" and "rad".
However, I cannot get this to work on larger matrices, so any help would be appreciated. The closest answer to this problem I've found is here: How to to parallelize the matrix transpose?
This is also my inspiration for splitting the double[][] structure into four parts. My (working) 4 x 4 code can be found below, so how can I modify it to work for four threads?
Cheers!
public double[][] transponerMatrise(double[][] matrise, int rad, int
kol, int id)
{
if((id != 2))
{
for (int i = rad; i < n/2 + rad; i++)
{
for (int j = kol+1; j < n/2 + kol; j++)
{
System.out.println("Traad " + id + " bytter " + i + "," + j + " med " + j + "," + i);
System.out.println("Value is " + matrise[i][j] + ", " + matrise[j][i]);
element = matrise[i][j];
matrise[i][j] = matrise[j][i];
matrise[j][i] = element;
}
}
}
else
{
for (int i = rad; i < n/2 + rad-1; i++)
{
for (int j = kol; j < n/2 + kol; j++)
{
System.out.println("Traad " + id + " bytter " + i + "," + j + " med " + j + "," + i);
System.out.println("Value is " + matrise[i][j] + ", " + matrise[j][i]);
element = matrise[i][j];
matrise[i][j] = matrise[j][i];
matrise[j][i] = element;
}
}
}
return matrise;
}
PS: I know the code works properly, because I have a checking method against a working sequential variant.
Upvotes: 2
Views: 1142
Reputation: 1261
based on the aproach of user3666197 you could do something like that:
public class Matrix {
private class Index {
Index(final int row, final int col) {
super();
this.row = row;
this.col = col;
}
int col;
int row;
}
public Matrix(final int rows, final int cols) {
this.rows = rows;
this.cols = cols;
data = new int[rows][cols];
}
public int get(final int row, final int col) {
return get(getIndex(row, col));
}
public void set(final int row, final int col, final int value) {
set(getIndex(row, col), value);
}
public void transpose() {
transpositioned = !transpositioned;
}
private int get(final Index index) {
return data[index.row][index.col];
}
private Index getIndex(final int row, final int col) {
return transpositioned ? new Index(col, row) : new Index(row, col);
}
private void set(final Index index, final int value) {
data[index.row][index.col] = value;
}
private final int cols;
private final int[][] data;
private final int rows;
private boolean transpositioned;
}
Upvotes: 0
Reputation: 1261
Maybe it could be an simple idea to use a thread to swap a row to col but therefore you need twice the memory for you matrix which could be a problem on huge matrices. also if you have a 6 Core CPU i think there is not much benefit from using 100 thread so my thread pool is very small. And as @user3666197 mentioned its a still a expensive solution - but paralell ;-)
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class MatrixTransposition {
public static void main(final String[] args) throws InterruptedException {
final MatrixTransposition transposition = new MatrixTransposition();
final int[][] source = transposition.create(32);
final int[][] transposed = transposition.solve(source);
System.out.println("Compare source and transpositon = " + transposition.compare(source, transposed));
final int[][] result = transposition.solve(transposed);
System.out.println("Compare source and double transpositon = " + transposition.compare(source, result));
transposition.print(source);
transposition.print(transposed);
}
public boolean compare(final int[][] a, final int[][] b) {
for (int r = 0; r < a.length; r++) {
for (int c = 0; c < a[0].length; c++) {
if (a[r][c] != b[r][c]) return false;
}
}
return true;
}
public int[][] create(final int size) {
final int[][] result = new int[size][size];
for (int r = 0; r < size; r++) {
for (int c = 0; c < size; c++) {
result[r][c] = r * size + c;
}
}
return result;
}
public void print(final int[][] input) {
final int size = input.length;
final int maxNr = size * size;
final int digits = new String(maxNr + "").length();
final String cellFormat = "%0" + digits + "d ";
for (int r = 0; r < input.length; r++) {
final int[] row = input[r];
for (final int c : row) {
System.out.print(String.format(cellFormat, c));
}
System.out.println("");
}
System.out.println("");
}
public int[][] solve(final int[][] input) throws InterruptedException {
final int width = input.length;
final int height = input[0].length;
final int[][] result = new int[width][height];
final CountDownLatch latch = new CountDownLatch(width);
for (int r = 0; r < width; r++) {
final int row = r;
threadPool.execute(() -> {
solvePart(result, input, row);
latch.countDown();
});
}
latch.await();
return result;
}
private void solvePart(final int[][] result, final int[][] input, final int r) {
System.out.println("Solve row " + String.format("%02d", r) + " in thread " + Thread.currentThread().getName());
final int[] row = input[r];
for (int c = 0; c < row.length; c++) {
result[c][r] = row[c];
}
}
private final ExecutorService threadPool = Executors.newFixedThreadPool(6);
}
Upvotes: 1
Reputation: 1
O(N^2)
costs-functionIf I may turn your attention to a smarter approach, having an almost O(1)
( constant ) cost, the trick would help you start working in a way more promising direction.
May try to add one thin abstraction layer above the high-performance tuned ( cache-lines friendly "raw"-storage of the matrix elements. This abstracted layer will help to access the "raw"-storage ( using indexing-axes, index-striding and slicing tricks - the very same way as HPC FORTRAN libraries have inspired the well-known numpy
and it's striding tricks) and this way the .T
-method will do nothing expensive ( alike bulldozing N^2
memory locations, swapping there and back ) but simply swap axes-parameters in the abstract layer, responsible for the indirection mapper, taking a few tens of nanoseconds for whatever large matrise[N,N]; where N = 10, 100, 1000, 10000, 100000, 1000000, 10000000+
still a few [ns]
.
There is nothing faster in doing this or other, more complex, matrix operations and both the FORTRAN and numpy
, performance polished, methods are a proof per-se of such observation.
Upvotes: 2