Reputation: 1205
I have a process that saved a dask
dataframe to parquet. The result is a directory with the name xxxxx.parquet
inside of which are 73 individual .parquet
files. I would like to get the metadata of the parquet files.
If I use fastparquet
it has an .info()
method that treats the directory like a single file. So I can do fp.ParquetFile(xxxxx.parquet).info['rows']
and it will return the number of rows in the entire data set.
I can't seem to find an equivalent functionality in pyarrow
which I have to use instead of fastparquet
. Is there one?
Upvotes: 0
Views: 98