pventura
pventura

Reputation: 117

Shaping Julia multidimensional arrays

I am new to Julia and am working with creating a properly shaped multidimensional array.

function get_deets(curric)
    curric = curric.metrics
    return ["" curric["complexity"][1] curric["blocking factor"][1] curric["delay factor"][1]] 
end

function compare_currics(currics...)
    headers = [" ", "Complexity", "Blocking Factor", "Delay Factor"]
    data = [get_deets(curric) for curric in currics]
     return pretty_table(data, headers)
end

The data I am getting back is:

3-element Array{Array{Any,2},1}:
 ["" 393.0 184 209.0]
 ["" 361.0 164 197.0]
 ["" 363.0 165 198.0]

However, I need something that looks like this:

3×4 Array{Any,2}:
 ""  393.0  184  209.0
 ""  361.0  164  197.0
 ""  363.0  165  198.0

Upvotes: 0

Views: 43

Answers (2)

I would replace the comprehension [get_deets(curric) for curric in currics] with a reduction.

For example:

using Random

function getdeets(curric)
    # random "deets", as a 1-D Vector
    return [randstring(4), rand(), 10rand(), 100rand()]
end

function getdata(currics)
    # All 1-D vectors are concatenated horizontally, to produce a
    # 2-D matrix with "deets" as columns (efficient since Julia matrices
    # are stored in column major order)
    data = reduce(hcat, getdeets(curric) for curric in currics)
    return data
end

With this, you get a slightly different structure than what you want: it is transposed, but that should be more efficient overall

julia> getdata(1:3)
4×3 Array{Any,2}:
   "B2Mq"     "S0hO"      "6KCn"
  0.291359   0.00046518  0.905285
  4.03026    0.612037    8.6458
 35.3133    79.3744      6.49379


If you want your tabular data to be presented in the same way as your question, this solution can easily be adapted:

function getdeets(curric)
    # random "deets", as a row matrix
    return [randstring(4) rand() 10rand() 100rand()]
end

function getdata(currics)
    # All rows are concatenated vertically, to produce a
    # 2-D matrix
    data = reduce(vcat, getdeets(curric) for curric in currics)
    return data
end

This produces:

julia> getdata(1:3)
3×4 Array{Any,2}:
 "eU7p"  0.563626  0.282499  52.1877
 "3pIw"  0.646435  8.16608   27.534
 "AI6z"  0.86198   0.235428  25.7382

Upvotes: 1

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42194

It looks like for the stuff you want to do you need a DataFrame rather than an Array. Look at the sample Julia session below:

julia> using DataFrames, Random

julia> df = DataFrame(_=randstring(4), Complexity=rand(4), Blocking_Factor=rand(4), Delay_Factor=rand(4))
4×4 DataFrame
│ Row │ _      │ Complexity │ Blocking_Factor │ Delay_Factor │
│     │ String │ Float64    │ Float64         │ Float64      │
├─────┼────────┼────────────┼─────────────────┼──────────────┤
│ 1   │ S6vT   │ 0.817189   │ 0.00723053      │ 0.358754     │
│ 2   │ S6vT   │ 0.569289   │ 0.978932        │ 0.385238     │
│ 3   │ S6vT   │ 0.990195   │ 0.232987        │ 0.434745     │
│ 4   │ S6vT   │ 0.59623    │ 0.113731        │ 0.871375     │

julia> Matrix(df[!,2:end])
4×3 Array{Float64,2}:
 0.817189  0.00723053  0.358754
 0.569289  0.978932    0.385238
 0.990195  0.232987    0.434745
 0.59623   0.113731    0.871375

Note that in the last part we have converted the numerical part of the data into an Array (I assume you need an Array at some point). Note that this Array is containing only Float64 elements. In practice this means that no boxing will occur when storing values and any operation on such Array will be an order of magnitude faster. To illustrate the point have a look at the code below (I copy the data from df into two almost identical Arrays).

julia> m = Matrix(df[!,2:end])
4×3 Array{Float64,2}:
 0.817189  0.00723053  0.358754
 0.569289  0.978932    0.385238
 0.990195  0.232987    0.434745
 0.59623   0.113731    0.871375

julia> m2 = Matrix{Any}(df[!,2:end])
4×3 Array{Any,2}:
 0.817189  0.00723053  0.358754
 0.569289  0.978932    0.385238
 0.990195  0.232987    0.434745
 0.59623   0.113731    0.871375

julia> using BenchmarkTools

julia> @btime mean($m)
  5.099 ns (0 allocations: 0 bytes)
0.5296580253263143

julia> @btime mean($m2)
  203.103 ns (12 allocations: 192 bytes)
0.5296580253263143

Upvotes: 0

Related Questions