TripleH
TripleH

Reputation: 479

running pyspark kafka steam with an error

When I tried to run an example code for spark-steaming: "kafka_wordcount.py" under the folder: /usr/local/spark/examples/src/main/python/streaming

The code explicitly describes the instruction to execute the code as:

" $ bin/spark-submit --jars \ external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \ examples/src/main/python/streaming/kafka_wordcount.py \ localhost:2181 test

test is the topic name. But I cannot find the jar and the path:

" external/kafka-assembly/target/scala-/spark-streaming-kafka-assembly-.jar"

So instead I created a folder "streaming/jar/" and put all jars from the website http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22 and then when I run

"park-submit --jars ~/stream-example/jars/spark-streaming-kafka-assembly_*.jar kafka_wordcount.py localhost:2181 topic"

which shows

"Error: No main class set in JAR; please specify one with --class Run with --help for usage help or --verbose for debug output"

What is wrong with that? Where are jars?

A ton of Thanks!!

Upvotes: 3

Views: 1287

Answers (1)

Maximiliano Guerra
Maximiliano Guerra

Reputation: 323

This question was asked long ago, so I assume you have figured out by now. But, as I just had the same problem, I will post the solution that worked for me.

The deployment section of this guide (http://spark.apache.org/docs/latest/streaming-kafka-integration.html) says you can pass the lib with the --packages argument, like bellow:

bin/spark-submit \ --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \ examples/src/main/python/streaming/kafka_wordcount.py \ localhost:2181 test

You can also download the jar itself here: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22

Note: I didn't ran the command above, I tested with this other example, but it should work the same way:

bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \ examples/src/main/python/streaming/direct_kafka_wordcount.py \ localhost:9092 test

Upvotes: 2

Related Questions