Reputation: 3270
I've a two-node spark standalone cluster and I'm trying to read some parquet files that I just saved but am getting files not found exception.
Checking the location, it looks like all the parquet files got created on one of the nodes in my standalone cluster.
The problem now, reading the parquet files back, it says cannot find xasdad.part file.
The only way I manage to load it is to scale down the standalone spark cluster to one node.
My question is how can I load my parquet files while running more than one node in my standalone cluster ?
Upvotes: 0
Views: 853
Reputation: 379
You have to put your files on a shard directory which is accessible to all spark nodes with the same path. Otherwise, use spark with Hadoop HDFS : a distributed file system.
Upvotes: 2