Reputation: 167
I'm trying to work with a apache beam pipeline that saves a parquet file in the end and validates the data using pyarrow and schema, and I have no idea why i'm getting this error:
pyarrow.lib.ArrowInvalid: Invalid null value [while running 'Write final dataset/Write core dataset facebook_insights_performance_ads/Write/WriteImpl/WriteBundles']
For debugging I've saved the data on a text file and there's nothing wrong with it.
thanks!
Upvotes: 1
Views: 8074
Reputation: 918
Rewriting from comments for others:
This error message is only emitted when converting data from python (e.g. pandas, numpy, python lists or dicts, etc.) to arrow. It is encountered when the schema has told arrow that the column should be null (e.g. the data type is the null data type which means all values are null) but the array itself contains values that are not null. For example:
import pyarrow as pa
pa.array([None, None, 1], type=pa.null())
Upvotes: 3