Philip K. Adetiloye
Philip K. Adetiloye

Reputation: 3270

Spark standalone cluster read parquet files after saving

I've a two-node spark standalone cluster and I'm trying to read some parquet files that I just saved but am getting files not found exception.

Checking the location, it looks like all the parquet files got created on one of the nodes in my standalone cluster.

The problem now, reading the parquet files back, it says cannot find xasdad.part file.

The only way I manage to load it is to scale down the standalone spark cluster to one node.

My question is how can I load my parquet files while running more than one node in my standalone cluster ?

Upvotes: 0

Views: 853

Answers (1)

jarjar
jarjar

Reputation: 379

You have to put your files on a shard directory which is accessible to all spark nodes with the same path. Otherwise, use spark with Hadoop HDFS : a distributed file system.

Upvotes: 2

Related Questions