Reputation: 2336
I'm working with a dataset that I read in from a csv. I have columns p1, p2, p3, and p4 that I would like to combine into a single column whose values are the array [p1 p2 p3 p4].
``` x = DataFrame(randn(100,4)) names!(x, [:p1; :p2; :p3; :p4])
x[:test] = x[[:p1, :p2, :p3, :p4]] x # Doesn't work ```
The result of the code above has a 100x4 DataFrames.DataFrame in every row of the data.
I saw this question Julia dataframe where a column is an array of arrays? but it doesn't address how to add a new array column as a function of the existing columns of the table.
Upvotes: 4
Views: 1993
Reputation: 2862
The value assigned to a new column should be a Vector, but x[[:p1, :p2, :p3, :p4]]
is a DataFrame, which will be repeated to a Vector of DataFrames.
I suggest you use Tuple rather than Vector to get better performance, it can be achieved by this code:
x[:test] = collect(zip(x[:p1],x[:p2],x[:p3],x[:p4]))
If you really need a Vector, this code can help:
x[:test] = map(collect, zip(x[:p1],x[:p2],x[:p3],x[:p4]))
(it looks a bit tricky. collect a Tuple returns a Vector)
Upvotes: 5