Themis
Themis

Reputation: 159

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception :

AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to `dbfs:/user/[email protected]/delta/customer-data/` using Databricks Delta, but there is no\ntransaction log present. Check the upstream job to make sure that it is writing\nusing format("delta") and that you are trying to write to the table base path.\n\nTo disable this check, SET spark.databricks.delta.formatCheck.enabled=false\nTo learn more about Delta, see https://docs.azuredatabricks.net/delta/index.html\n;

Here is the code preceding the exception :

from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, StringType

inputSchema = StructType([
  StructField("InvoiceNo", IntegerType(), True),
  StructField("StockCode", StringType(), True),
  StructField("Description", StringType(), True),
  StructField("Quantity", IntegerType(), True),
  StructField("InvoiceDate", StringType(), True),
  StructField("UnitPrice", DoubleType(), True),
  StructField("CustomerID", IntegerType(), True),
  StructField("Country", StringType(), True)
])

rawDataDF = (spark.read
  .option("header", "true")
  .schema(inputSchema)
  .csv(inputPath)
)

# write to Delta Lake
rawDataDF.write.mode("overwrite").format("delta").partitionBy("Country").save(DataPath) 

Upvotes: 12

Views: 44689

Answers (5)

Abdennacer Lachiheb
Abdennacer Lachiheb

Reputation: 4888

Try to set this spark conf to false:

spark.databricks.delta.formatCheck.enabled false

Upvotes: 0

Aaditya Damle
Aaditya Damle

Reputation: 1

I tried to solve this error by changing the file type to parquet format using .write.format("parquet").mode("overwrite").option("overwriteSchema", "true").save("location") and I was able to save the data in that location.

But when I tried to access that table from hive_metastore, the data was not accessible and it prompted to save the data in delta format. The error also implied the presence of a _delta_log file which is created in the parent folder where you are trying to save these files.

The _delta_log file is created when you save data frames as delta/parquet in any folder. I think the main cause of this error is the presence of this particular file in you folder structure. Don't create sub folders to save the tables in a folder where this file is present and other data files are present.

I was able to save the data frames in "Delta" format when I created a new/fresh folder structure.

Upvotes: 0

Daniel Moraite
Daniel Moraite

Reputation: 516

One can get this error if also tries to read the data in a format that is not supported by spark.read (or if does not specify the format).

The file format should be specified along the supported formats: csv, txt, json, parquet or arvo.

dataframe = spark.read.format('csv').load(path) 

Upvotes: 0

user__42
user__42

Reputation: 573

I found this Question with this search: "You are trying to write to *** using Databricks Delta, but there is no transaction log present."

In case someone searches for the same: For me the solution was to explicitly code

.write.format("parquet")

because

.format("delta")

is the dafault since Databricks Runtime 8.0 and above and I need "parquet" for legacy reasons.

Upvotes: 3

Michael Armbrust
Michael Armbrust

Reputation: 1565

This error message is telling you that there is already data at the destination path (in this case dbfs:/user/[email protected]/delta/customer-data/), and that that data is not in the Delta format (i.e. there is no transaction log). You can either choose a new path (which based on the comments above, it seems like you did) or delete that directory and try again.

Upvotes: 17

Related Questions