Shayan
Shayan

Reputation: 6345

What is the most efficient way to reshape a 3d Array into a 2d with a specific order in julia?

Consider the following Array:

julia> test_mat3 = rand(2, 3, 2)
2×3×2 Array{Float64, 3}:
[:, :, 1] =
 0.539697  0.17689   0.944783
 0.537687  0.660573  0.19497

[:, :, 2] =
 0.726739    0.798789  0.0900478
 0.00744727  0.837076  0.25798

I want to reshape it into a 2d Array with a specific order. First, I want to concatenate each row of each inner matrix (here with shape 2x3). So for the first step, I expect several 1x6 inner matrixes. Second, I want to concatenate these matrixes vertically. The expected output on the test_mat3 would be:

2×6 Array{Float64, 2}:
 0.539697    0.17689   0.944783    0.537687    0.660573  0.19497
 0.726739    0.798789  0.0900478   0.00744727  0.837076  0.25798

However, I wrote this code:

using SplitApplyCombine

function flatten_3d(arr::Array{Float64, 3})
    mat = zeros(
        Float64,
        size(arr, 3),
        size(arr, 1)*size(arr, 2)
    )

    for (idx, arr)=enumerate(splitdimsview(arr, 3))
        the_vec = vcat(eachrow(arr)...)
        mat[idx, :] = the_vec
    end

    return mat
end

The script does the job:

julia> flatten_3d(test_mat3)
2×6 Matrix{Float64}:
 0.539697  0.17689   0.944783   0.537687    0.660573  0.19497
 0.726739  0.798789  0.0900478  0.00744727  0.837076  0.25798

But I wonder if there is any better way to do this? Any built-in function, etc.

Upvotes: 1

Views: 371

Answers (2)

mcabbott
mcabbott

Reputation: 2580

I made a package for figuring such things out. This one looks like:

julia> using TensorCast

julia> @cast mat[k,(j,i)] := test_mat3[i,j,k];

julia> mat == flatten_3d(test_mat3)
true

To read this:

  • the 3rd dimension of the input is becoming the 1st dim of the output. It has a temporary name k.
  • each "inner matrix" is indexed by i, j. Transposing would be j, i, and combining these into one dimension with vec would be (i,j), in which i changes fastest (as arrays are column-major).
  • in (j,i) it is j changes fastest, like transpose then vec. This is placed into the 2nd dim of the output.

What it produces (in this instance) is just permutedims then reshape. One way to use it is just a way to figure out the right permutation etc:

julia> @pretty  @cast mat[k,(j,i)] := test_mat3[i,j,k]
begin
    @boundscheck ndims(test_mat3) == 3 || throw(ArgumentError("expected a 3-tensor test_mat3[i, j, k]"))
    local (ax_i, ax_j, ax_k) = (axes(test_mat3, 1), axes(test_mat3, 2), axes(test_mat3, 3))
    local elephant = transmute(test_mat3, Val((3, 2, 1)))
    mat = reshape(elephant, (ax_k, star(ax_j, ax_i)))
end

This is in fact more like PermutedDimsArray plus reshape: by default this returns a view of the input, when possible, not a copy. This is very quick but the next operation may or may not be quick. To make a Matrix instead, it borrows a better algorithm for permutations from TensorOperations.jl, which is faster on large arrays:

julia> flatten_lazy(x) = @cast _[k,(j,i)] := x[i,j,k];

julia> flatten_eager(x) = @cast _[k,(j,i)] |= x[i,j,k];

julia> using BenchmarkTools

julia> let test_mat = rand(20,30,200)
         a = @btime flatten_3d($test_mat)
         b = @btime reshape(permutedims($test_mat, (3,2,1)), (200, 600))
         c = @btime flatten_lazy($test_mat)
         d = @btime flatten_eager($test_mat)
         a == b == c == d
       end
  956.708 μs (17002 allocations: 3.17 MiB)
  143.875 μs (3 allocations: 937.62 KiB)
  63.520 ns (0 allocations: 0 bytes)  # note the units, no time at all!
  53.291 μs (63 allocations: 942.45 KiB)
true

The package's documentation is here: https://mcabbott.github.io/TensorCast.jl/dev/basics/

Upvotes: 4

Dan Getz
Dan Getz

Reputation: 18217

The following code should help:

using Combinatorics

test_mat3 = zeros(2, 3, 2);                                                                            
test_mat3[:, :, 1] =                                                                                   
  [0.539697  0.17689   0.944783 ;                                                                       
   0.537687  0.660573  0.19497 ]                                                                                                                                                                                       
test_mat3[:, :, 2] =                                                                                   
  [ 0.726739    0.798789  0.0900478;                                                                     
    0.00744727  0.837076  0.25798]

for p in permutations(1:3)
    A = reshape(permutedims(test_mat3,p),(2,6))
    @show p
    display(A)
end

Gives:

p = [1, 2, 3]
2×6 Matrix{Float64}:
 0.539697  0.17689   0.944783  0.726739    0.798789  0.0900478
 0.537687  0.660573  0.19497   0.00744727  0.837076  0.25798
p = [1, 3, 2]
2×6 Matrix{Float64}:
 0.539697  0.726739    0.17689   0.798789  0.944783  0.0900478
 0.537687  0.00744727  0.660573  0.837076  0.19497   0.25798
p = [2, 1, 3]
2×6 Matrix{Float64}:
 0.539697  0.944783  0.660573  0.726739  0.0900478   0.837076
 0.17689   0.537687  0.19497   0.798789  0.00744727  0.25798
p = [2, 3, 1]
2×6 Matrix{Float64}:
 0.539697  0.944783  0.798789   0.537687  0.19497     0.837076
 0.17689   0.726739  0.0900478  0.660573  0.00744727  0.25798
p = [3, 1, 2]
2×6 Matrix{Float64}:
 0.539697  0.537687    0.17689   0.660573  0.944783   0.19497
 0.726739  0.00744727  0.798789  0.837076  0.0900478  0.25798
p = [3, 2, 1]
2×6 Matrix{Float64}:
 0.539697  0.17689   0.944783   0.537687    0.660573  0.19497
 0.726739  0.798789  0.0900478  0.00744727  0.837076  0.25798

So, the requested permutation is:

julia> reshape(permutedims(test_mat3, (3,2,1)), (2,6))
2×6 Matrix{Float64}:
 0.539697  0.17689   0.944783   0.537687    0.660573  0.19497
 0.726739  0.798789  0.0900478  0.00744727  0.837076  0.25798

ADDENDUM: Why (3,2,1) ?

permutedim uses the permutation to make original axis #3 into axis #1 in new array (and overall (3,2,1) means 3=>1, 2=>2, 1=>3).

Julia considers columns as the first axis (rows are second). Which, incidently, is the reason why it is always preferable to read/write column elements sequentially for fast memory access.

In the OP 0.5396 and 0.7267 are the first output column, but in the input, are in the same row/col but different planes (third axis). So we permute axis 3=>1.

In the OP 0.5396 and 0.1768 are the first output row values, but in the input, are in the same row/plane but different columns (second axis). After the reshape, the second and third axes are combined, again, the earlier axis change fastest. We want to change from 0.53 to 0.17 first and not to 0.5376 (moving on the first input axis), so we permute axis 2=>2.

The third axis completes the permutation, and this explanation which is easier to grasp by following the numbers in the example of all permutations above than by reading ;)

Upvotes: 3

Related Questions