Reputation: 555
[A Julia noob question.]
Let's say I have a vector of dataframes, as below:
using DataFrames
A = DataFrame([8.1 9.2 7.5 6.6; 6.9 8.1 6.8 5.8])
B = DataFrame([9.0 2.1 5.2 5.3; 1.2 4.9 9.8 7.7])
dfs = [A, B]
Of course, I have actually a much larger number of dataframes in dfs
than in this MWE, but all of them have the same dimensions, and all of them have only numeric columns.
I would like to transform dfs
into a multidimensional (here, 2x4x2) array arr
, with arr[:, :, 1]
equal to A, and arr[:, :, 2]
equal to B. How can I perform this transformation? (Of course, a for
loop might do the trick, but I guess that there is a more elegant way to proceed.)
Thanks!
Upvotes: 3
Views: 365
Reputation: 18530
I suppose that
f1(dfs) = cat(Matrix.(dfs)..., dims=3)
is a reasonably elegant one-liner, but it allocates temporaries.
From a speed perspective you can probably beat it easily with the following one-liner
f2(dfs) = [ dfs[k][n,m] for n = 1:size(dfs[1],1), m = 1:size(dfs[1],2), k = 1:length(dfs) ]
Having said that, if you're willing to be a little more verbose, you can probably do better again using the iteration protocols specifically designed for use with DataFrame
.
function f3(dfs)
y = Array{Float64,3}(undef, size(dfs[1],1), size(dfs[1],2), length(dfs))
for k = 1:length(dfs) ; for (n,col) in enumerate(eachcol(dfs[k]))
y[:,n,k] = col
end ; end
return y
end
As a general rule, if you want speed in Julia, loops are often the best approach. Let's do a quick comparison of the three approaches:
julia> using BenchmarkTools
julia> @btime f1($dfs);
182.454 μs (132 allocations: 7.89 KiB)
julia> @btime f2($dfs);
935.217 ns (21 allocations: 672 bytes)
julia> @btime f3($dfs);
338.664 ns (11 allocations: 368 bytes)
So f3
is pretty much 6x faster than f1
. You could throw an @inbounds
in f2
and f3
for further optimization although I suspect it won't gain you that much...
Now, to be fair, I just assumed everything was Float64
here. However, with a quick type check up front, you can generalise this to any type (as long as it is all one type - which presumably it is given that you're wanting to convert to a single array).
Upvotes: 3