Reputation: 21
I am building an array of tuples from a dataframe in Julia Version 1.0.2 (2018-11-08)
In earlier versions of Julia I was able to create an array of tuples using DataArray as follows:
df is my dataframe and p_1 through p_30 are the selected columns
tuple.array= collect(zip(df[:p_1], convert(DataArray, df[:p_2]),df[:p_3], df[:p_4], df[:p_5],df[:p_6], df[:p_7], df[:p_8], df[:p_9], df[:p_10],df[:p_11],
df[:p_12], df[:p_13], df[:p_14],df[:p_15],df[:p_16],df[:p_17],df[:p_18], df[:p_19],df[:p_20],df[:p_21],df[:p_22],df[:p_23],df[:p_24],df[:p_25],
df[:p_26], df[:p_27],df[:p_28], df[:p_29], df[:p_30]))
However, now this gives an error: UndefVarError: DataArray not defined
I am now trying to build an array of tuples using a for loop, however, this is not working.
k = []
function f(x,
k::Array{Tuple{Int64,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,String,Float64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,String,Int64,Float64,String},1}
)
for i = 1:2
a = x[i,1], x[i,2], x[i,3], x[i,4], x[i,5], x[i,6], x[i,7], x[i,8],
x[i,9], x[i,10], x[i,11], x[i,12], x[i,13], x[i,14], x[i,15], x[i,16],
x[i,17], x[i,18], x[i,19], x[i,20], x[i,21], x[i,22], x[i,23], x[i,24],
x[i,25], x[i,26], x[i,27], x[i,28], x[i,29], x[i,30]
k[i] = a
end
k
end
ans = f(df)
Results should look like:
[(1, 35455.87, 5.5, 4.5, 83, 0.06, 0.000166, 4.0e-5, 2.45e-9, 5.93e-11, 0.25, 0.01851, 0.33, 0.5, 0.01851, "FC", 258.129, 90, 0, 120, 240, 360, 420, 0, 0, 5000, "10/12/2017", 54, 0.1, "TRUE"),(2, 1.05e6, 4.75, 4.0, 83, 0.06, 0.000125, 2.95e-5, 1.85e-9, 5.88e-11, 0.25, 0.01851, 0.33, 0.5, 0.01851, "FC", 258.129, 90, 0, 120, 240, 360, 420, 0, 0, 5000, "10/12/2017", 54, 0.1, "TRUE")]
Array{Tuple{Int64,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,String,Float64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,String,Int64,Float64,String},1}
IT looks like it does not combine the tuples Instead this is the output:
(1, 35455.87, 5.5, 4.5, 83, 0.06, 0.000166, 4.0e-5, 2.45e-9, 5.93e-11, 0.25, 0.01851, 0.33, 0.5, 0.01851, "FC", 258.129, 90, 0, 120, 240, 360, 420, 0, 0, 5000, "10/12/2017", 54, 0.1, "TRUE")
(2, 1.05e6, 4.75, 4.0, 83, 0.06, 0.000125, 2.95e-5, 1.85e-9, 5.88e-11, 0.25, 0.01851, 0.33, 0.5, 0.01851, "FC", 258.129, 90, 0, 120, 240, 360, 420, 0, 0, 5000, "10/12/2017", 54, 0.1, "TRUE")
(4, 30)
(4/30)
seems to be the size of my dataframe
typeof(ans)
Tuple{Int64,Int64}
Upvotes: 2
Views: 1237
Reputation: 69839
This is how you can convert a DataFrame
to an array of tuples:
julia> df = DataFrame(rand(4,5))
4×5 DataFrame
│ Row │ x1 │ x2 │ x3 │ x4 │ x5 │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ 1 │ 0.672177 │ 0.946374 │ 0.595168 │ 0.722334 │ 0.00143513 │
│ 2 │ 0.705244 │ 0.34661 │ 0.679062 │ 0.19639 │ 0.665722 │
│ 3 │ 0.714121 │ 0.25532 │ 0.334179 │ 0.796099 │ 0.31926 │
│ 4 │ 0.915351 │ 0.101242 │ 0.241781 │ 0.497605 │ 0.255265 │
julia> values.(eachrow(df)) # option 1
4-element Array{NTuple{5,Float64},1}:
(0.6721769645742341, 0.9463742907744139, 0.5951678416468196, 0.7223337537204884, 0.0014351278761846054)
(0.7052438908128555, 0.34661004784791927, 0.6790618125985046, 0.19639048780237434, 0.6657217151376063)
(0.7141207500236866, 0.2553202507731116, 0.33417888761790393, 0.7960990393316545, 0.31926035845300627)
(0.9153512246404232, 0.10124180902852187, 0.24178081071551794, 0.49760454756012784, 0.2552649323458289)
julia> Tuple.(eachrow(df)) # option 2
4-element Array{NTuple{5,Float64},1}:
(0.6721769645742341, 0.9463742907744139, 0.5951678416468196, 0.7223337537204884, 0.0014351278761846054)
(0.7052438908128555, 0.34661004784791927, 0.6790618125985046, 0.19639048780237434, 0.6657217151376063)
(0.7141207500236866, 0.2553202507731116, 0.33417888761790393, 0.7960990393316545, 0.31926035845300627)
(0.9153512246404232, 0.10124180902852187, 0.24178081071551794, 0.49760454756012784, 0.2552649323458289)
Upvotes: 2