Reputation: 1343
Using AWS glue developer endpoint Spark Version - 2.4
Python Version- 3
Code:
df=spark.read.format("avro").load("s3://dataexport/users/prod-users.avro")
Getting the following error message while trying to read avro file:
Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;
Found the following links, but not helpful to resolve my issue
https://spark.apache.org/docs/latest/sql-data-sources-avro.html[Apache Avro Data Source Guide][1]
Apache Avro as a Built-in Data Source in Apache Spark 2.4
Upvotes: 2
Views: 4062
Reputation: 1171
Have you imported the package while starting the shell? If not you need to start a shell as below. Below package is applicable for spark 2.4+ version.
pyspark --packages com.databricks:spark-avro_2.11:4.0.0
Also write as below inside read.format:
df=spark.read.format("com.databricks.spark.avro").load("s3://dataexport/users/prod-users.avro")
Note: For pyspark you need to write 'com.databricks.spark.avro' instead of 'avro'.
Upvotes: 1