S Bourgeois
S Bourgeois

Reputation: 95

How can one only load specific columns using CSV in Julia?

I'm new to Julia, and I've been looking for a way to only load specific columns from a space-separated file; the solutions given on CSV's github page don't seem to work (https://github.com/JuliaData/CSV.jl/issues/154), neither on Julia 1.0.1 nor 1.3.1.

This is an example input file named test.txt

a b c.p
1 2 3
4 5 6

julia> using CSV, Tables  

julia> df = CSV.File(inputfile, delim=" ", header=1, type=String) |> select(:a, :b) |> DataFrame  
ERROR: UndefVarError: select not defined Stacktrace:  [1] top-level scope at none:0

Julia> df = CSV.File("test.txt", delim=" ", header=1, type=String) |> Tables.select(:a, :b) |> DataFrame  
ERROR: UndefVarError: select not defined Stacktrace:  [1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13  [2] top-level scope at none:0

So, here are my questions:

  1. What would be the correct syntax to load columns a and b?
  2. What would be the correct syntax to load columns a and c.p? (set apart by the use of a dot in the column name)
  3. What would be the correct syntax to load specific columns by number, for example columns 1 and 3, from a file that doesn't have a header?

Thanks,

Steph

Upvotes: 5

Views: 1988

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69839

Use the select option of CSV.File as described here:

julia> CSV.File(inputfile, delim=" ", header=1, type=String, select=[:a,:b])
2-element CSV.File{false}:
 CSV.Row{false}: (a = "1", b = "2")
 CSV.Row{false}: (a = "4", b = "5")

julia> CSV.File(inputfile, delim=" ", header=1, type=String, select=[:a,:var"c.p"])
2-element CSV.File{false}:
 CSV.Row{false}: (a = "1", c.p = "3")
 CSV.Row{false}: (a = "4", c.p = "6")

julia> CSV.File(inputfile, delim=" ", header=1, type=String, select=[1,3])
2-element CSV.File{false}:
 CSV.Row{false}: (a = "1", c.p = "3")
 CSV.Row{false}: (a = "4", c.p = "6")

And if the file has no header use the header option as described here.

Upvotes: 8

Related Questions