Reputation: 1881
When reading a CSV file using Polars in Python, we can use the parameter dtypes
to specify the schema to use (for some columns). I wonder can we do the same when reading or writing a Parquet file? I tried to specify the dtypes
parameter but it doesn't work.
I have some Parquet files generated from PySpark and want to load those Parquet files into Rust. The Rust requires unsigned integers while Spark/PySpark does not have unsigned integers and output signed integers into Parquet files. To make things simpler, I'd like to convert types of columns of Parquet files before loading them into Rust. I know there are several different ways to achieve this (both in pandas and polars) but I wonder whether there's easy and efficient way to do this using polars.
The code that I used to cast column types using polars in Python is as below.
import polars as pl
...
df["id0"] = df.id0.cast(pl.datatypes.UInt64)
Upvotes: 5
Views: 7141
Reputation: 14670
Parquet files have a schema. We respect the schema of:
DataFrame
upon writingIf you want to change the schema you read/write, you need to cast columns in the DataFrame
.
That's what we would do if we would accept a schema, so efficiency is the same.
Upvotes: 4