Reputation: 81
I have been trying to filter a data set of 100 entry points based on the value of the first column on the data set to create two separate variables with my data.
The values are filtered based on a zero or a 1 and then I want to store the rest of the data without the zero or one from first column. So far I have tried the following code but it doesn't seem to be working very well.
data1 = zeros(50,2)
data2 = zeros(50,2)
index1 = 0
index2 = 0
for i in 1:100
if data[i,1] < 0.5
data1[index1] = data[i, 2:3]
index+=1
else
data2[index2] = data[i, 2:3]
index2+=1
end
end
The only thing I'm getting is this error, which I cannot understand
MethodError: Cannot `convert` an object of type DataFrames.DataFrameRow{DataFrames.DataFrame,DataFrames.SubIndex{DataFrames.Index,UnitRange{Int64},UnitRange{Int64}}} to an object of type Float64
Closest candidates are:
convert(::Type{T}, !Matched::T) where T<:Number at number.jl:6
convert(::Type{T}, !Matched::Number) where T<:Number at number.jl:7
convert(::Type{T}, !Matched::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
...
Stacktrace:
[1] setindex!(::Array{Float64,2}, ::DataFrames.DataFrameRow{DataFrames.DataFrame,DataFrames.SubIndex{DataFrames.Index,UnitRange{Int64},UnitRange{Int64}}}, ::Int64) at .\array.jl:825
[2] top-level scope at .\In[47]:9
Upvotes: 2
Views: 182
Reputation: 81
Let's make use of the groupby
function from DataFrames.jl
, which is the current default for CSV.read
. In the future, you can use CSV.File
and then collect that into a DataFrame
or Matrix
as you see fit.
# get groups based off the first column
groups = groupby(data, 1)
gcol = groupcol(groups)
# for each group, drop that column and collect into a DataFrame
# could easily collect into a Matrix, too!
data1, data2 = [DataFrame(g[!, Not(gcol)]) for g in groups]
EDIT: based on the comments from Bogumil I've updated the solution to use the current best practices!
Upvotes: 1