WestCoastProjects
WestCoastProjects

Reputation: 63022

"The associated location already exists" when saving a Spark DataFrame with mode('overwrite') set

With mode('overwrite') set during a saveAsTable() operation:


df1.write.format('parquet').mode('overwrite').saveAsTable(
    'spark_no_bucket_table1')

Then why does saving a table fail?

pyspark.sql.utils.AnalysisException: Can not create the managed 
      table('`spark_no_bucket_table1`'). 
The associated location('file:experiments/spark-warehouse/spark_no_bucket_table1') 
   already exists.

Upvotes: 4

Views: 3911

Answers (1)

Gabio
Gabio

Reputation: 9484

From Spark's 2.4.0 migration guide:

Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set true to spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation restores the previous behavior. This option will be removed in Spark 3.0.

So if you use Spark in version >= 2.4.0 and < 3.0.0, you can solve it by setting:

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

For Spark version > 3.0.0, you will have to manually clean up the data directory specified in the error message.

Upvotes: 5

Related Questions