parquet time stamp overflow with fastparquet/pyarrow

Question

I have a parquet file I am reading from s3 using fastparquet/pandas , the parquet file has a column with date 2022-10-06 00:00:00 , and I see it is wrapping it as 1970-01-20 06:30:14.400, Please see code and error and screen shot of parquet file below .I am not sure why this is happening ? 2022-09-01 00:00:00 seems to be fine. if I choose "pyarrow" as the engine, it fails with exception

pyarrow error:
    pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: 101999952000000000

Please advise.

fastparquet error:

OverflowError: value too large
Exception ignored in: 'fastparquet.cencoding.time_shift'
OverflowError: value too large
OverflowError: value too large

code:

s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket="blah", Key="blah1")
df=pd.read_parquet(io.BytesIO(obj['Body'].read()),engine="fastparquet")

parquet time stamp overflow with fastparquet/pyarrow

Answers (1)

Related Questions