rudinable
rudinable

Reputation: 61

Dataframe from dictionary list in Julia

I'm dealing with a multi-level dictionary in Julia. The outermost dictionary is Dict{String, Any}, which looks like this:

`Dict{String, Any} with 38 entries:
  "2024-11-15:169" => Dict{String, Any}("395.0"=>Any[Dict{String, Any}("mini"=>…
  "2024-06-28:29"  => Dict{String, Any}("418.0"=>Any[Dict{String, Any}("mini"=>…
  "2025-06-20:386" => Dict{String, Any}("750.0"=>Any[Dict{String, Any}("mini"=>…`

Each value is also a dictionary Dict{String, Any} with:

`Dict{String, Any} with 112 entries:
  "475.0" => Any[Dict{String, Any}("mini"=>false, "settlementType"=>"P", "low52…
  "500.0" => Any[Dict{String, Any}("mini"=>false, "settlementType"=>"P", "low52…
  "456.0" => Any[Dict{String, Any}("mini"=>false, "settlementType"=>"P", "low52…`

Ultimately, I want to create a dictionary that has values which are dataframes. I.e. assign to each key from the outermost dictionary a dataframe based on the inner dictionary (With columns like "strike"=[475, 500, 456] or "mini"=[false, false, false] for example).

What does the Any[...] mean in the inner dictionary? How can I get rid of it because I only have dictionary types inside? Then, how do I efficiently collect the info from each dictionary into dataframe columns?

Thanks.

I tried just parsing it with DataFrame(outerdict) but it creates some kind of weird multi-level column index, again with the Any's. Also, with dictionaries as dataframe values it feels quite inefficient to manually create new columns every time from the dictionary entries.

Upvotes: 1

Views: 64

Answers (1)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42214

It is not clear from your question what you need.

So let's assume we have a nested dictionary:

julia> d = Dict(:df1=>[Dict(:a=>10,:b=>20),Dict(:a=>11,:b=>21)], :df2=>[Dict(:a=>10,:b=>20),Dict(:a=>11,:b=>21)])
Dict{Symbol, Vector{Dict{Symbol, Int64}}} with 2 entries:
  :df2 => [Dict(:a=>10, :b=>20), Dict(:a=>11, :b=>21)]
  :df1 => [Dict(:a=>10, :b=>20), Dict(:a=>11, :b=>21)]

We can convert it to a dictionary of data frames:

julia> dfs = Dict(keys(d) .=> DataFrame.(values(d)))
Dict{Symbol, DataFrame} with 2 entries:
  :df2 => 2×2 DataFrame…
  :df1 => 2×2 DataFrame…

Were a single data frame looks like this:

julia> dfs[:df1]
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │    10     20
   2 │    11     21

The trick is the correct usage of broadcasting.

The Any[] in the inner dictionary means a Vector of elements of type Any. Any is an abstract type - a supertype for any Julia type. In my example having d = Dict(:df1=>Any[Dict(:a=>10,:b=>20),Dict(:a=>11,:b=>21)], :df2=>Any[Dict(:a=>10,:b=>20),Dict(:a=>11,:b=>21)]) would not have changed anything (except for performance which would be lower as the usage of abstract containers is not recommended in Julia).

Upvotes: 0

Related Questions