rotsoc
rotsoc

Reputation: 53

Make a matrix out of a dictionary in Julia where the keys are strings I have to split

I have a dictionary of the form

"san-diego.new-york" => 0.225
"seattle.topeka"     => 0.162
"san-diego.chicago"  => 0.162
"seattle.new-york"   => 0.225
"san-diego.topeka"   => 0.126
"seattle.chicago"    => 0.153

I want to transform this into a 2x3 matrix where i is the set san-diego, seattle and j is the set new-york, topeka, chicago. I've tried splitting the keys by using split.(keys(dict),".") but didn't get anywhere.

I want to do this in order to do calculations of the form M[i][j]=0.5 afterwards.

edit: I made a new dictionary where the keys are tuples. I don't know if this helps.

c = Dict("san-diego.new-york" => 0.225, "seattle.topeka" => 0.162, "san-diego.chicago"  => 0.162
, "seattle.new-york"   => 0.225, "san-diego.topeka"   => 0.126, "seattle.chicago"    => 0.153)

a = split.(keys(c),".")
b = collect(values(c))

new_c = Dict((a[i][1],a[i][2])=>b[i] for i in 1:length(b))

I ended up writing the following function

function fillmatrix()

    c = Dict("san-diego.new-york" => 0.225, "seattle.topeka"     => 0.162, "san-diego.chicago"  => 0.162
    , "seattle.new-york"   => 0.225, "san-diego.topeka"   => 0.126, "seattle.chicago"    => 0.153)

    a = split.(keys(c),".")
    b = collect(values(c))

    new_c = Dict((a[i][1],a[i][2])=>b[i] for i=1:length(b))

    list_i = []
    list_j = []
    for (u,v) in keys(new_c)
        push!(list_i,u)
        push!(list_j,v)
    end

    i = unique(list_i)
    j = unique(list_j)

    A = zeros((length(i),length(j)))


    for ii in i
        for jj in j
            A[findfirst(x->x==ii,i),findfirst(x->x==jj,j)] = new_c[(ii,jj)]
        end
    end

    return A
end

But this seems like a long workaround and I would like to generalize it to more dimensions. Any thoughts? Thanks in advance.

Upvotes: 1

Views: 492

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69869

I will give a solution from your original dictionary (the other can be adjusted accordingly). You can use the NamedArrays.jl package to solve your problem. Here is a full solution:

using NamedArrays

d = Dict("san-diego.new-york" => 0.225,
         "seattle.topeka"     => 0.162,
         "san-diego.chicago"  => 0.162,
         "seattle.new-york"   => 0.225,
         "san-diego.topeka"   => 0.126,
         "seattle.chicago"    => 0.153)

s = split.(keys(d), '.')
row = unique(string.(getindex.(s, 1)))
col = unique(string.(getindex.(s, 2)))

m = NamedArray([d[r*"."*c] for r in row, c in col],
               (row, col), ("from", "to"))

(this assumes that all row-column pairs are present otherwise instead of d[r*"."*c] write get(d, r*"."*c, missing) and you have missing values in entries that are not present in your dictionary)

And now you can write:

julia> m
2×3 Named Array{Float64,2}
from ╲ to │ new-york    topeka   chicago
──────────┼─────────────────────────────
san-diego │    0.225     0.126     0.162
seattle   │    0.225     0.162     0.153

julia> m["san-diego", "new-york"]
0.225

julia> m[2,3]
0.153

(essentially you can use names or integer indices to reference columns/rows)

Also note that I convert row and col entries to String but we could also leave them as SubStrings (i.e. omit string. part in the call), but String looks a bit nicer when printed as NamedArray row/column.

Upvotes: 3

Related Questions