Writing into the existing directory Dataframe using partitionBy

Question

In the below code ,I am not able to write the dataframe into an existing directory,It just exits from the spark submit job.Is there a way I can write it to existing directory other than creating a new directory?

Here test is a dataframe

test.repartition(100).write.partitionBy("date").parquet(hdfslocation)

Ramesh Maharjan · Accepted Answer

You can always write to the existing directory, if the filenames are different in each write. You should find a mechanism to change names of the output files.

If you want to Overwrite existing files in existing directory, then you don't need to change the filenames but simply use mode option as

test.repartition(100).write.mode(SaveMode.Overwrite).partitionBy("date").parquet(hdfslocation)

There are other mode options you can play with : Append, ErrorIfExists, Ignore, valueOf, values

Writing into the existing directory Dataframe using partitionBy

Answers (1)

Related Questions