KeyboardHunter
KeyboardHunter

Reputation: 81

How to filter data from CSV based on a the value of a colum

I have been trying to filter a data set of 100 entry points based on the value of the first column on the data set to create two separate variables with my data.

The values are filtered based on a zero or a 1 and then I want to store the rest of the data without the zero or one from first column. So far I have tried the following code but it doesn't seem to be working very well.

data1 = zeros(50,2)
data2 = zeros(50,2)

index1 = 0
index2 = 0

for i in 1:100    
    if data[i,1] < 0.5
        data1[index1] = data[i, 2:3]
        index+=1
    else 
        data2[index2] = data[i, 2:3]
        index2+=1
    end
end

The only thing I'm getting is this error, which I cannot understand

MethodError: Cannot `convert` an object of type DataFrames.DataFrameRow{DataFrames.DataFrame,DataFrames.SubIndex{DataFrames.Index,UnitRange{Int64},UnitRange{Int64}}} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, !Matched::T) where T<:Number at number.jl:6
  convert(::Type{T}, !Matched::Number) where T<:Number at number.jl:7
  convert(::Type{T}, !Matched::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  ...

Stacktrace:
 [1] setindex!(::Array{Float64,2}, ::DataFrames.DataFrameRow{DataFrames.DataFrame,DataFrames.SubIndex{DataFrames.Index,UnitRange{Int64},UnitRange{Int64}}}, ::Int64) at .\array.jl:825
 [2] top-level scope at .\In[47]:9

Upvotes: 2

Views: 182

Answers (1)

Miles Lucas
Miles Lucas

Reputation: 81

Let's make use of the groupby function from DataFrames.jl, which is the current default for CSV.read. In the future, you can use CSV.File and then collect that into a DataFrame or Matrix as you see fit.

# get groups based off the first column
groups = groupby(data, 1)
gcol = groupcol(groups)
# for each group, drop that column and collect into a DataFrame
# could easily collect into a Matrix, too!
data1, data2 = [DataFrame(g[!, Not(gcol)]) for g in groups]

EDIT: based on the comments from Bogumil I've updated the solution to use the current best practices!

Upvotes: 1

Related Questions