MikeB2019x
MikeB2019x

Reputation: 1205

how to read parquet metadata for pyarrow Dataset

I have a process that saved a dask dataframe to parquet. The result is a directory with the name xxxxx.parquet inside of which are 73 individual .parquet files. I would like to get the metadata of the parquet files.

If I use fastparquet it has an .info() method that treats the directory like a single file. So I can do fp.ParquetFile(xxxxx.parquet).info['rows'] and it will return the number of rows in the entire data set.

I can't seem to find an equivalent functionality in pyarrow which I have to use instead of fastparquet. Is there one?

Upvotes: 0

Views: 98

Answers (0)

Related Questions