Reputation: 1530
I have files with .snappy.parquet extension that I need to read into my Jupyter notebook, and convert it to pandas dataframe.
import numpy
import pyarrow.parquet as pq
filename = "part-00000-tid-2430471264870034304-5b82f32f-de64-40fb-86c0-fb7df2558985-1598426-1-c000.snappy.parquet"
df = pq.read_table(filename).to_pandas()
The error is:
ArrowNotImplementedError: lists with structs are not supported
Upvotes: 6
Views: 20346
Reputation: 8816
As of 2019-11-30, columns which are of type List[Struct[..]]
(i.e. mixed nesting of lists and structs) are not supported by Apache Arrow. As mentioned in a different answer, the related issue is https://issues.apache.org/jira/browse/ARROW-1644.
To still read this file, you can read in all columns that are of supported types by supplying the columns
argument to pyarrow.parquet.read_table
. To find out which columns have the complex nested types, look at the schema of the file using pyarrow.parquet.ParquetFile(filename).schema
.
Upvotes: 4