Tom J Muthirenthi
Tom J Muthirenthi

Reputation: 3340

pyspark creates output file as folder

Pyspark creates folder instead of file. For the below command, it creates an empty folder with name proto.parquet in the directory.

df.write.parquet("output/proto.parquet")

Tried with csv and other formats, but still the same.

Upvotes: 0

Views: 2774

Answers (1)

xmorera
xmorera

Reputation: 1961

The fact that Spark creates a folder instead of a file is the expected behavior. The reason being that Spark is a distributed system, hence data is processed in partitions and each worker node will write out its data to a part file.

So what you are seeing is the way it should work. It works the same way with mapreduce.

Upvotes: 1

Related Questions