Reputation: 3340
Pyspark creates folder instead of file. For the below command, it creates an empty folder with name proto.parquet in the directory.
df.write.parquet("output/proto.parquet")
Tried with csv and other formats, but still the same.
Upvotes: 0
Views: 2774
Reputation: 1961
The fact that Spark creates a folder instead of a file is the expected behavior. The reason being that Spark is a distributed system, hence data is processed in partitions and each worker node will write out its data to a part file.
So what you are seeing is the way it should work. It works the same way with mapreduce.
Upvotes: 1