Reputation: 179
I am trying to create a simple Spark Structured Streaming app where I need to read a stream from Kafka. However, when I run the following code:
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe, "mytopic") \
.load()
Then I'm getting the following error:
AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".
So, according to the Structured Streaming + Kafka Integration Guide, I need to run the following command:
./bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ...
This gives me the following error which I do not understand:
Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/home/myname/spark-3.1.2-bin-hadoop3.2/... does not exist' Please specify one with --class.
Note: I am inside my spark-3.1.2-bin-hadoop3.2
folder when executing this command.
Upvotes: 0
Views: 833
Reputation: 191728
according to the Structured Streaming + Kafka Integration Guide, I need to run the following command:
The ...
is not literal. You need to provide the rest of the command, which includes --class
https://spark.apache.org/docs/latest/submitting-applications.html
Upvotes: 1