3D FFT decomposition in 2D FFT

Basically I am solving the diffusion equation in 3D using FFT and one of the ways to parallelise this is to decompose the 3D FFT in 2D FFTs.

As described in this paper: https://cmb.ornl.gov/members/z8g/csproject-report.pdf

The way to decompose a 3d fft would be by doing:

2d fft in xy direction global transpose 1d fft in z direction

Basically, my problem is that I am not sure how to do this global transpose (as I assume it's transposing a 3d array I suppose). Anyone has came accross this? Thanks a lot.

Upvotes: 3

Views: 2504

Answers (2)

Mark Borgerding
Mark Borgerding

Reputation: 8476

Think of a 3d cube with nx*ny*nz elements. The 3d FFT of these elements is mathematically 3 stages of 1-d FFTs, one along each axis:

  1. Do ny*nz transforms along the X axis, each transform handles nx elements
  2. nx*nz transforms along the Y axis
  3. nx*ny transforms along the Z axis

More generally, an N-dimensional FFT (N>1) is composed of many (N-1)-dimensional FFTs along that axis.

If the signal is real and you have an FFT that can return the half spectrum, then stage 1 would be about half as expensive (real FFT is cheaper), the remaining stages need to be complex, but they only need to have about half as many transforms. So the cost is roughly half.

If your 1d FFT can read input elements that are strided and pack the output into a contiguous buffer, then you end up doing a transposition at each stage.

This is how kissfft performs multi-dimensional FFTs.

P.S. When I need to get a mental pictures of higher dimensions, I think of: sheets of paper with matrices of numbers (2d), in folders of numbered papers (3d), in numbered filing cabinets (4d), in numbered rooms (5d), in numbered buildings (6d), and so on ... So I can visualize the "filing cabinet" dimension

Upvotes: 10

zonksoft
zonksoft

Reputation: 2429

The "global transposition" mentioned in the paper is not a mathematical operation, but a rearrangement of data between the distributed memory machines.

The data calculated on one machine in step 1 has to be transferred to all other machines, vice versa, for step to. It has nothing to do with a matrix transposition.

Upvotes: 2

Related Questions