numbersguy132
numbersguy132

Reputation: 137

Reading Strings as Vectors Julia

I currently have a Julia dataframe of the form

A B
"[1,2]" "[3,4]"

and would like to make it of the form

A1 A2 B1 B2
1 2 3 4

or of the form (where the vectors are no longer strings).

| A | B | |---|---| |[1,2]|[3,4]| is there any way to do this? I have already looked at a few posts where people try to convert vectors of the form ["1", "2"] to the form [1,2] but nothing along the lines of what I have.

Thanks for the help.

Upvotes: 4

Views: 202

Answers (2)

Dan Getz
Dan Getz

Reputation: 18227

Define

# WARNING: `parsedf` parses/evaluates input text and
# is therefore an infosec weakness. Should be used only
# with properly vetted input.

parsedf(df) = DataFrame([c=>eval.(Meta.parse.(df[:,c])) 
                         for c in names(df)])

spreaddf(df) = DataFrame([c*"$i" => get.(df[:, c],i,missing) 
                          for (c, i) in vcat([[(names(df)[i],j)
                          for j=1:L] 
                          for (i,L) in enumerate([maximum(length.(df[:,i])) 
                          for i in 1:ncol(df)])]...)]...)

Now,

julia> df = DataFrame(A=["[1,2]"],B=["[3,4]"])
1×2 DataFrame
 Row │ A       B      
     │ String  String 
─────┼────────────────
   1 │ [1,2]   [3,4]

julia> spreaddf(parsedf(df))
1×4 DataFrame
 Row │ A1     A2     B1     B2    
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      2      3      4

seems to do it.

Also,

julia> spreaddf(parsedf(DataFrame(A=["[1,2]","[5,6]"], B=["[3,4]","[7,8,9]"])))
2×5 DataFrame
 Row │ A1     A2     B1     B2     B3      
     │ Int64  Int64  Int64  Int64  Int64?  
─────┼─────────────────────────────────────
   1 │     1      2      3      4  missing 
   2 │     5      6      7      8        9

seems appropriate.

Upvotes: 0

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Here is an example way how you can do it:

julia> using DataFrames

julia> df = DataFrame(A="[1,2]", B="[3,4]")
1×2 DataFrame
 Row │ A       B
     │ String  String
─────┼────────────────
   1 │ [1,2]   [3,4]

julia> select(df, [:A, :B] .=>
                  ByRow(x -> parse.(Int, split(chop(x, head=1, tail=1), ','))) .=>
                  [[:A1, :A2], [:B1, :B2]])
1×4 DataFrame
 Row │ A1     A2     B1     B2
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      3      4

If something requires an explanation please ask in the comment.

Upvotes: 2

Related Questions