Blibli
Blibli

Reputation: 63

NullPointerException when writing parquet

I am trying to measure how long does it take me to read and write parquet files in Amazon s3 (under a specific partition) For that I wrote a script that simply reads the files and than write them back:

val df = sqlContext.read.parquet(path + "p1.parquet/partitionBy=partition1")
df.write.mode("overwrite").parquet(path + "p1.parquet/partitionBy=partition1")

However I get a null pointer exception. I tried to add df.count in between, but got the same error.

Upvotes: 2

Views: 2162

Answers (1)

Shaido
Shaido

Reputation: 28392

The reason for the error is that Spark only reads the data when it is going to be used. This results in Spark reading data from the file at the same time as trying to overwrite the file. This causes an issue since data can't be overwritten while reading.

I'd recommend saving to a temporary location as this is for timing purposes. An alternative would be to use .cache() on the data when reading, perform an action to force the read (as well as actually cache the data), and then overwrite the file.

Upvotes: 2

Related Questions