Reputation: 93
Since Spark 2.4.0 it's possible to save as AVRO without external jars. However I can't get it working at all. My code looks like this:
key = 'filename.avro'
df.write.mode('overwrite').format("avro").save(key)
I get the following error:
pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'
So I look at the Apache Avro Data Source Guide (https://spark.apache.org/docs/latest/sql-data-sources-avro.html) and it gives the following example:
df=spark.read.format("avro").load("examples/src/main/resources/users.avro")
df.select("name","favorite_color").write.format("avro").save("namesAndFavColors.avro")
It is the same, so I'm lost.. Anyone have an idea what goes wrong?
Upvotes: 2
Views: 4124
Reputation: 632
You can use this line to save in avro format
df2.write.format("avro").save(file_location + "file_name.avro")
Upvotes: 1
Reputation: 1525
The spark-avro module is external and not included in spark-submit or spark-shell by default.
As with any Spark applications, spark-submit is used to launch your application. spark-avro_2.11 and its dependencies can be directly added to spark-submit using --packages
, such as,
./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 ...
For experimenting on spark-shell, you can also use --packages
to add org.apache.spark:spark-avro_2.11 and its dependencies directly,
./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 ...
Upvotes: 0
Reputation: 17
The documentation you've linked clearly says that:
The spark-avro module is external and not included in spark-submit or spark-shell by default.
and further explains how to include the package.
So your statement:
Since Spark 2.4.0 it's possible to save as AVRO without external jars. H
is just incorrect.
Upvotes: 0