jbssm
jbssm

Reputation: 7161

Read parquet file compressed with zstd

I am new to Julia and I am trying to port some stuff I did in Python.

I have a file I wrote in Python, with a DataFrame to a parquet file using the zstd compression lib (supported by both pandas and fastparquet, parquet file writing).

It gives an error since ParquetFiles or FileIO (not sure which one is responsible for the decompression), doesn't support zstd.

Any ideas on how to read this file in Julia?

using DataFrames
using ParquetFiles
using FileIO

test = DataFrame(load("test.parquet"))

Unknown compression codec for column chunk: 6  
Stacktrace:
 [1] error(::String) at ./error.jl:33 
 [2] bytes at /home/morgado/.julia/packages/Parquet/qSvbc/src/reader.jl:149 [inlined]  
 [3] bytes at /home/morgado/.julia/packages/Parquet/qSvbc/src/reader.jl:140 [inlined]  
 [4] values(::ParFile, ::Parquet.Page) at /home/morgado/.julia/packages/Parquet/qSvbc/src/reader.jl:232  
 [5] values(::ParFile, ::Parquet.PAR2.ColumnChunk) at /home/morgado/.julia/packages/Parquet/qSvbc/src/reader.jl:178  
 [6] setrow(::ColCursor{Int64}, ::Int64) at /home/morgado/.julia/packages/Parquet/qSvbc/src/cursor.jl:144  
 [7] ColCursor(::ParFile, ::UnitRange{Int64}, ::String, ::Int64) at /home/morgado/.julia/packages/Parquet/qSvbc/src/cursor.jl:115  
 [8] (::getfield(Parquet, Symbol("##11#12")){ParFile,UnitRange{Int64},Int64})(::String) at ./none:0  
 [9] collect(::Base.Generator{Array{AbstractString,1},getfield(Parquet, Symbol("##11#12")){ParFile,UnitRange{Int64},Int64}}) at ./generator.jl:47  
 [10] RecCursor(::ParFile, ::UnitRange{Int64}, ::Array{AbstractString,1}, ::JuliaBuilder{ParquetFiles.RCType361}, ::Int64) at /home/morgado/.julia/packages/Parquet/qSvbc/src/cursor.jl:269 (repeats 2 times)  
 [11] getiterator(::ParquetFiles.ParquetFile) at /home/morgado/.julia/packages/ParquetFiles/cLLFb/src/ParquetFiles.jl:74 
 [12] nondatavaluerows(::ParquetFiles.ParquetFile) at /home/morgado/.julia/packages/Tables/IT0t3/src/tofromdatavalues.jl:16  
 [13] columns at /home/morgado/.julia/packages/Tables/IT0t3/src/fallbacks.jl:173 [inlined]  
 [14] #DataFrame#393(::Bool, ::Type, ::ParquetFiles.ParquetFile) at /home/morgado/.julia/packages/DataFrames/VrZOl/src/other/tables.jl:34  
 [15] DataFrame(::ParquetFiles.ParquetFile) at /home/morgado/.julia/packages/DataFrames/VrZOl/src/other/tables.jl:25  
 [16] top-level scope at In[25]:8

Upvotes: 3

Views: 1102

Answers (0)

Related Questions