Reputation: 167

pyarrow.lib.ArrowInvalid: Invalid null value

I'm trying to work with a apache beam pipeline that saves a parquet file in the end and validates the data using pyarrow and schema, and I have no idea why i'm getting this error:

pyarrow.lib.ArrowInvalid: Invalid null value [while running 'Write final dataset/Write core dataset facebook_insights_performance_ads/Write/WriteImpl/WriteBundles']

For debugging I've saved the data on a text file and there's nothing wrong with it.

thanks!

Upvotes: 1

Answers (1)

Cubez

Reputation: 918

Rewriting from comments for others:

This error message is only emitted when converting data from python (e.g. pandas, numpy, python lists or dicts, etc.) to arrow. It is encountered when the schema has told arrow that the column should be null (e.g. the data type is the null data type which means all values are null) but the array itself contains values that are not null. For example:

import pyarrow as pa
pa.array([None, None, 1], type=pa.null())

Upvotes: 3

pyarrow.lib.ArrowInvalid: Invalid null value

Answers (1)

Related Questions