NaWeeD
NaWeeD

Reputation: 609

java.lang.IllegalArgumentException: Illegal Capacity: -102 when reading a large parquet file by pyspark

I have a large parquet file (~5GB) and I want to load it in spark. The following command executes without any error:

df = spark.read.parquet("path/to/file.parquet")

But when I try to do any operation like .show() or .repartition(n) I run into the following error:

java.lang.IllegalArgumentException: Illegal Capacity: -102

any ideas on how I can fix this?

Upvotes: 2

Views: 2683

Answers (1)

Jeff Baranski
Jeff Baranski

Reputation: 1269

It's an integer overflow bug in the underlying parquet reader. https://issues.apache.org/jira/browse/PARQUET-1633

Upgrade PySpark to 3.2.1. The jar file parquet-hadoop-1.12.2 contains the code/actual fix.

Upvotes: 3

Related Questions