Can I specify a schema when reading/writing a Parquet file using Polars in Python?

Question

When reading a CSV file using Polars in Python, we can use the parameter dtypes to specify the schema to use (for some columns). I wonder can we do the same when reading or writing a Parquet file? I tried to specify the dtypes parameter but it doesn't work.

I have some Parquet files generated from PySpark and want to load those Parquet files into Rust. The Rust requires unsigned integers while Spark/PySpark does not have unsigned integers and output signed integers into Parquet files. To make things simpler, I'd like to convert types of columns of Parquet files before loading them into Rust. I know there are several different ways to achieve this (both in pandas and polars) but I wonder whether there's easy and efficient way to do this using polars.

The code that I used to cast column types using polars in Python is as below.

import polars as pl

...
df["id0"] = df.id0.cast(pl.datatypes.UInt64)

Can I specify a schema when reading/writing a Parquet file using Polars in Python?

Answers (1)

Related Questions