Reputation: 175
I've started to play with Apache Parquet I was surprised about 2 versions of writers.
PARQUET_1_0 ("v1"),
PARQUET_2_0 ("v2");
I tried to get the metadata/dump using parquet-tools to determine the version, but it did not include this info.
Currently I have a parquet file.
How do I determine the parquet write version used to write this file?
Upvotes: 0
Views: 1196
Reputation: 11
I've been struggling with this. If you install parquet-tools you can do a :
parquet-tools inspect <parquet file> --detail | head -n2
and you get the version which is different from the format version :
FileMetaData
version = 1
however not sure if it is impacted by the file writer version ...
Upvotes: 1
Reputation: 31
You can use pyarrow.parquet
to view the writer version of the Parquet file:
import pyarrow.parquet as pq
parquet_file = pq.ParquetFile('sample.parquet')
parquet_file.metadata
This would print something like:
<pyarrow._parquet.FileMetaData object at 0x7f72447fc530>
created_by: parquet-mr version 1.12.2 (build d35ce51f56a2166b09164cc89d7c18ce346dc83f)
num_columns: 14
num_rows: 11464901
num_row_groups: 1
format_version: 1.0
serialized_size: 3277
And format_version
is what you are looking for.
See https://arrow.apache.org/docs/python/parquet.html
Upvotes: 1