Sjors
Sjors

Reputation: 93

Save dataframe as AVRO Spark 2.4.0

Since Spark 2.4.0 it's possible to save as AVRO without external jars. However I can't get it working at all. My code looks like this:

key = 'filename.avro'
df.write.mode('overwrite').format("avro").save(key)

I get the following error:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'

So I look at the Apache Avro Data Source Guide (https://spark.apache.org/docs/latest/sql-data-sources-avro.html) and it gives the following example:

df=spark.read.format("avro").load("examples/src/main/resources/users.avro")

df.select("name","favorite_color").write.format("avro").save("namesAndFavColors.avro")

It is the same, so I'm lost.. Anyone have an idea what goes wrong?

Upvotes: 2

Views: 4124

Answers (3)

Nabia Salman
Nabia Salman

Reputation: 632

You can use this line to save in avro format

 df2.write.format("avro").save(file_location + "file_name.avro")

Upvotes: 1

Ajay Kharade
Ajay Kharade

Reputation: 1525

The spark-avro module is external and not included in spark-submit or spark-shell by default.

As with any Spark applications, spark-submit is used to launch your application. spark-avro_2.11 and its dependencies can be directly added to spark-submit using --packages, such as,

./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 ...

For experimenting on spark-shell, you can also use --packages to add org.apache.spark:spark-avro_2.11 and its dependencies directly,

./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 ...

Upvotes: 0

user10713663
user10713663

Reputation: 17

The documentation you've linked clearly says that:

The spark-avro module is external and not included in spark-submit or spark-shell by default.

and further explains how to include the package.

So your statement:

Since Spark 2.4.0 it's possible to save as AVRO without external jars. H

is just incorrect.

Upvotes: 0

Related Questions