Reputation: 909
I want to extract the 3rd and 7th row of a data frame in Julia. The MWE is:
using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data
10×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 16 │
│ 2 │ 2 │ 17 │
│ 3 │ 3 │ 18 │
│ 4 │ 4 │ 19 │
│ 5 │ 5 │ 20 │
│ 6 │ 6 │ 21 │
│ 7 │ 7 │ 22 │
│ 8 │ 8 │ 23 │
│ 9 │ 9 │ 24 │
│ 10 │ 10 │ 25 │
Upvotes: 4
Views: 1160
Reputation: 42204
The great thing about Julia is that you do not need to materialize the result (and hence save memory and time on copying the data). Hence, if you need a subrange of any array-like structure it is better to use @view
rather than materialize directly
julia> @view my_data[[3, 7], :]
2×2 SubDataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 3 │ 18 │
│ 2 │ 7 │ 22 │
Now the performance testing.
function submean1(df)
d = df[[3, 7], :]
mean(d.A)
end
function submean2(df)
d = @view df[[3, 7], :]
mean(d.A)
end
And tests:
julia> using BenchmarkTools
julia> @btime submean1($my_data)
689.262 ns (19 allocations: 1.38 KiB)
5.0
julia> @btime submean2($my_data)
582.315 ns (9 allocations: 288 bytes)
5.0
Even in this simplistic example @view
is 15% faster and uses four times less memory. Of course sometimes you want to copy the data but the rule of thumb is not to materialize.
Upvotes: 2
Reputation: 909
This should give you the expected output:
using DataFrames
my_data = DataFrame(A = 1:10, B = 16:25);
my_data;
my_data[[3, 7], :]
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 3 │ 18 │
│ 2 │ 7 │ 22 │
Upvotes: 4