Reputation: 2632
In pandas, when we are reading a csv file using the function pandas.read_csv
we may set the keyword error_bad_lines = False
which allows us to skip lines with too many fields and guarantee that a DataFrame object is returned. See the documentation here.
In Julia I am using CSV.read
to read some data but no object is returned. Following the documentation I use CSV.validate
to see what the problem is and I get CSV.TooManyColumnsError
. So I was wondering if there is a similar keyword (to that of pandas) in Julia? More in general, what can be the way to overcome this error and get a DataFrame returned?
Upvotes: 0
Views: 679
Reputation: 69829
Actually the way CSV.jl should behave by default is to read-in the data and drop the extra columns. Here is an example:
julia> using CSV, DataFrames
julia> println(read("x.txt", String))
a,b,c
1,2,3
4,5,6,7,8
1,2
1,2,3
julia> df = CSV.read("x.txt")
4×3 DataFrame
│ Row │ a │ b │ c │
│ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼────────┼────────┼─────────┤
│ 1 │ 1 │ 2 │ 3 │
│ 2 │ 4 │ 5 │ 6 │
│ 3 │ 1 │ 2 │ missing │
│ 4 │ 1 │ 2 │ 3 │
So in short: over-long lines are not skipped, but truncated.
And over-short lines (as you can see in the example) are filled with missing
. But in all cases you should get the DataFrame
object returned.
Of course CSV.validate
should error on the first invalid line:
julia> CSV.validate("x.txt")
ERROR: CSV.TooManyColumnsError("row=2, col=3: expected 3 columns then a newline or EOF; parsed row: '4, 5, 6'")
Upvotes: 1