PyArrow Table: Cast a Struct within a ListArray column to a new schema

Question

I have a parquet file with a struct field in a ListArray column where the data type of a field within the struct changed from an int to float with some new data.

In order to combine the new and old data i had been reading the active & historical parquet files in with pq.read_table and then using pa.concat_table to combine and write the new file.

So to make the schema of the two tables compatible before concatenating i do the following:

active = pq.read_table("path	o\active\parquet")
active_schema = active.schema

hist = pq.read_table("path	o\hist\parquet")
hist = hist.cast(target_schema=active_schema)

combined = pa.concat_tables([active, hist])

But I get the folowing error when casting:

ArrowNotImplementedError: Unsupported cast from struct, line_total: struct, reversal: bool, include_for: list, quantity: int64, seats: int64, units: int64, percentage: int64> to struct using function cast_struct

Based on this it seems i wont be able to do the cast.

So my question is, how can I go about merging these datasets / how can I update the schema on the old table? I'm trying to stay within the arrow / parquet ecosystem if possible.

PyArrow Table: Cast a Struct within a ListArray column to a new schema

Answers (1)

Related Questions