Reputation: 31
Does Julia have any analogues of the nest
and unnest
functions from the tidyr R package? Particularly, is there a way to make efficient nesting / unnesting operations using DataFrames.jl?
Upvotes: 3
Views: 525
Reputation: 42214
Suppose you have the following DataFrame
:
julia> d = DataFrame(g=[1,1,1,2,2,3,3,], val1=1:7, val2 = 'a':'g')
7×3 DataFrame
│ Row │ g │ val1 │ val2 │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 1 │ 1 │ 'a' │
│ 2 │ 1 │ 2 │ 'b' │
│ 3 │ 1 │ 3 │ 'c' │
│ 4 │ 2 │ 4 │ 'd' │
│ 5 │ 2 │ 5 │ 'e' │
│ 6 │ 3 │ 6 │ 'f' │
│ 7 │ 3 │ 7 │ 'g' │
and assume that you want to sample one element from each group defined by the g
column.
This can be achieved by:
julia> DataFrame([rand(eachrow(gr)) for gr in groupby(d,:g)])
3×3 DataFrame
│ Row │ g │ val1 │ val2 │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 1 │ 2 │ 'b' │
│ 2 │ 2 │ 4 │ 'd' │
│ 3 │ 3 │ 6 │ 'f' │
Hope this is what you need.
If you want a different element count from each group you could do something like this:
julia> g_to_rows=Dict(1=>4,2=>3,3=>7); # desired element counts
julia> [ gr[rand(1:nrow(gr),g_to_rows[gr.g[1]]), :] for gr in groupby(d,:g)]
3-element Array{DataFrame,1}:
4×3 DataFrame
│ Row │ g │ val1 │ val2 │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 1 │ 1 │ 'a' │
│ 2 │ 1 │ 1 │ 'a' │
│ 3 │ 1 │ 3 │ 'c' │
│ 4 │ 1 │ 2 │ 'b' │
3×3 DataFrame
│ Row │ g │ val1 │ val2 │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 2 │ 5 │ 'e' │
│ 2 │ 2 │ 5 │ 'e' │
│ 3 │ 2 │ 5 │ 'e' │
7×3 DataFrame
│ Row │ g │ val1 │ val2 │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 3 │ 7 │ 'g' │
│ 2 │ 3 │ 6 │ 'f' │
│ 3 │ 3 │ 6 │ 'f' │
│ 4 │ 3 │ 7 │ 'g' │
│ 5 │ 3 │ 7 │ 'g' │
│ 6 │ 3 │ 6 │ 'f' │
│ 7 │ 3 │ 6 │ 'f' │
Upvotes: 2