TimH
TimH

Reputation: 11

How to combine two arrays iterating over the first row in julia?

Currently I have a following- two matrices A and B. In matrix A the first row corresponds the investment type and the second row is number of scenario (1,2,3) for both investment types bonds and stocks:

A =["bond" "bond" "bond" "stock" "stock" "stock"; 1 2 3 1 2 3; 0.1 0.12 0.13 0.14 0.15 0.16; 0.17 0.18 0.19 0.2 0.21 0.22]

In matrix B I have two values for both bonds and stocks, the past values.

B= ["bond" "stock"; 0.31 0.32; 0.34 0.35]

My data consists of this similar type of data matrix with multiple investment types and >1000 scenarios. How can I merge these two matrices to create following output for the large dataset:

merged_matrix = ["bond" "bond" "bond" "stock" "stock" "stock";1 2 3 1 2 3; 0.31 0.31 0.31 0.32 0.32 0.32; 0.34 0.34 0.34 0.35 0.35 0.35; 0.1 0.12 0.13 0.14 0.15 0.16; 0.17 0.18 0.19 0.2 0.21 0.22]

I have difficulties in generating the right for-loop iteration. I am unsure how to use the investment type as an identifying key. I have tried to format these to DataFrame, but haven't found any suitable solution either. Any help is highly appreciated!

Upvotes: 1

Views: 67

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69829

Per the comment here is how you can do it in DataFrames.jl (I have given some arbitrary names to the columns - probably you can change it to something more meaningful for your case):

julia> using DataFrames

julia> dfA = identity.(DataFrame(permutedims(A), ["type", "v1", "v2", "v3"]))
6×4 DataFrame
 Row │ type    v1     v2       v3
     │ String  Int64  Float64  Float64
─────┼─────────────────────────────────
   1 │ bond        1     0.1      0.17
   2 │ bond        2     0.12     0.18
   3 │ bond        3     0.13     0.19
   4 │ stock       1     0.14     0.2
   5 │ stock       2     0.15     0.21
   6 │ stock       3     0.16     0.22

julia> dfB = identity.(DataFrame(permutedims(B), ["type", "v4", "v5"]))
2×3 DataFrame
 Row │ type    v4       v5
     │ String  Float64  Float64
─────┼──────────────────────────
   1 │ bond       0.31     0.34
   2 │ stock      0.32     0.35

julia> rightjoin(dfB, dfA, on="type")
6×6 DataFrame
 Row │ type    v4        v5        v1     v2       v3
     │ String  Float64?  Float64?  Int64  Float64  Float64
─────┼─────────────────────────────────────────────────────
   1 │ bond        0.31      0.34      1     0.1      0.17
   2 │ bond        0.31      0.34      2     0.12     0.18
   3 │ bond        0.31      0.34      3     0.13     0.19
   4 │ stock       0.32      0.35      1     0.14     0.2
   5 │ stock       0.32      0.35      2     0.15     0.21
   6 │ stock       0.32      0.35      3     0.16     0.22

Note that I use identity.(...) operation to make column types concrete (as your original data has Any type which is inefficient as was highlighted in the comment).

Upvotes: 2

Related Questions