Reputation: 1159
I have multidimensional array stored in dataframe in Julia
.
dfy = DataFrame(a = [[1,2,3],[4,5,6],[7,8,9]], b = ["M","F","F"])
3×2 DataFrame
│ Row │ a │ b │
│ │ Array… │ String │
├─────┼───────────┼────────┤
│ 1 │ [1, 2, 3] │ M │
│ 2 │ [4, 5, 6] │ F │
│ 3 │ [7, 8, 9] │ F │
I would like to get the first column "a" and store the first value in each element in X1 (1,4,7) and second value in each row in X2 (2,5,8) and third value in each row in X3 (3,6,9).
How can we accomplish this in Julia
programming language?
Upvotes: 2
Views: 449
Reputation: 42194
You could try this:
for i in 1:3
dfy[:, "X$i"] = getindex.(dfy.a,i)
end
Once run here is the result:
julia> dfy
3×5 DataFrame
│ Row │ a │ b │ X1 │ X2 │ X3 │
│ │ Array… │ String │ Int64 │ Int64 │ Int64 │
├─────┼───────────┼────────┼───────┼───────┼───────┤
│ 1 │ [1, 2, 3] │ M │ 1 │ 2 │ 3 │
│ 2 │ [4, 5, 6] │ F │ 4 │ 5 │ 6 │
│ 3 │ [7, 8, 9] │ F │ 7 │ 8 │ 9 │
The dot .
after getindex
is a vectorization operator and hence you are gettinh i
-th element from each row of the a
column of your DataFrame
.
Upvotes: 2
Reputation: 69819
I give several options to show you what you can do.
Before I give my options let me comment on the alternative answer, which in general is a most natural way to get what you want, if you want to update the existing data frame. DataFrames.jl does not support indexing by a column name only. DataFrame.jl is a two dimensional object and thus it requires passing both row and column index like this:
julia> for i in 1:3
dfy[:, "X$i"] = getindex.(dfy.a, i)
end
julia> dfy
3×5 DataFrame
│ Row │ a │ b │ X1 │ X2 │ X3 │
│ │ Array… │ String │ Int64 │ Int64 │ Int64 │
├─────┼───────────┼────────┼───────┼───────┼───────┤
│ 1 │ [1, 2, 3] │ M │ 1 │ 2 │ 3 │
│ 2 │ [4, 5, 6] │ F │ 4 │ 5 │ 6 │
│ 3 │ [7, 8, 9] │ F │ 7 │ 8 │ 9 │
(note that this is what the error message prompts you to do -- i.e. that setindex!
requires one more argument to be passed)
Now some more advanced options. The first is:
julia> rename!(x -> "X"*x, DataFrame(Tuple.(dfy.a)))
3×3 DataFrame
│ Row │ X1 │ X2 │ X3 │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 2 │ 3 │
│ 2 │ 4 │ 5 │ 6 │
│ 3 │ 7 │ 8 │ 9 │
because I understand you want a new data frame, or to create a new data frame combining old an new columns just use horizontal concatentation:
julia> [dfy rename!(x -> "X"*x, DataFrame(Tuple.(dfy.a)))]
3×5 DataFrame
│ Row │ a │ b │ X1 │ X2 │ X3 │
│ │ Array… │ String │ Int64 │ Int64 │ Int64 │
├─────┼───────────┼────────┼───────┼───────┼───────┤
│ 1 │ [1, 2, 3] │ M │ 1 │ 2 │ 3 │
│ 2 │ [4, 5, 6] │ F │ 4 │ 5 │ 6 │
│ 3 │ [7, 8, 9] │ F │ 7 │ 8 │ 9 │
Finally, if you want to update the existing data frame you can write:
julia> transform!(dfy, [:a => (x -> getindex.(x, i)) => "X$i" for i in 1:3]...)
3×5 DataFrame
│ Row │ a │ b │ X1 │ X2 │ X3 │
│ │ Array… │ String │ Int64 │ Int64 │ Int64 │
├─────┼───────────┼────────┼───────┼───────┼───────┤
│ 1 │ [1, 2, 3] │ M │ 1 │ 2 │ 3 │
│ 2 │ [4, 5, 6] │ F │ 4 │ 5 │ 6 │
│ 3 │ [7, 8, 9] │ F │ 7 │ 8 │ 9 │
Upvotes: 1