justin
justin

Reputation: 99

Error with writing Parquet files to local disk

I am writing spark dataframes on local disk and I cannot read it back.

val path = "file:///mnt/resources/....."
df.write.parquet(path) 
val d = spark.read.parquet(path)

I get the following error:

org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;

I am fine with reading and writing from/to Azure Datalake or Storage, but not with local disk. Has anyone face the same issue? how to solve it? I tested with .csv files also, and in that case, it says the file does not exist, even I can see the file when I login to the worker nodes.

Upvotes: 1

Views: 1098

Answers (1)

user8935827
user8935827

Reputation: 21

TL;DR Writes to local file system are useful only for testing in local mode.

You should not use local file system for writes when using cluster deployment. In that case each executor writes to its own file system, and it just impossible to achieve consistent reads afterwards.

Upvotes: 2

Related Questions