Robin
Robin

Reputation: 179

PySpark and Kafka: org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file... does not exist'

I am trying to create a simple Spark Structured Streaming app where I need to read a stream from Kafka. However, when I run the following code:

df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe, "mytopic") \
.load()

Then I'm getting the following error:

AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".

So, according to the Structured Streaming + Kafka Integration Guide, I need to run the following command:

./bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ...

This gives me the following error which I do not understand:

Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/home/myname/spark-3.1.2-bin-hadoop3.2/... does not exist' Please specify one with --class.

Note: I am inside my spark-3.1.2-bin-hadoop3.2 folder when executing this command.

Upvotes: 0

Views: 833

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191728

according to the Structured Streaming + Kafka Integration Guide, I need to run the following command:

The ... is not literal. You need to provide the rest of the command, which includes --class

https://spark.apache.org/docs/latest/submitting-applications.html

Upvotes: 1

Related Questions