Reputation: 17
I am trying to read eventhub data (AVRO) format. I am having issues loading data into a dataframe in databricks.
Here's the code I am using. Please let me know if I am doing anything wrong
path='/mnt/datastore/origin/zone=raw/subject=customer_events/source=EventHub/ver=1.0/*.avro'
df = spark.read.format("com.databricks.spark.avro") \
.load(path)
Error
IllegalArgumentException: 'java.net.URISyntaxException: Relative path in absolute URI:
I did try using some code to remove the error, but I am getting the syntax errors
import org.apache.spark.sql.SparkSession
SparkSession spark = SparkSession
.builder()
.config("spark.sql.warehouse.dir","/mnt/datastore/origin/zone=raw/subject=customer_events/source=EventHub/ver=1.0/")
.getOrCreate()
SyntaxError: invalid syntax
File "<command-265213674761208>", line 2
SparkSession spark = SparkSession
Upvotes: 1
Views: 1201
Reputation: 191758
Relative path in absolute URI
You need to specify the protocol rather than use /mnt
For example, wasb://some/path/
if reading from Azure blobstore
You can also exclude *.avro
since the Avro reader should already pick up all Avro files in the path
https://docs.databricks.com/data/data-sources/read-avro.html#python-api
And if you want to read from EventHub, that exposes a Kafka API, not a filepath, AFAIK
Upvotes: 1