Reputation: 479
When I tried to run an example code for spark-steaming: "kafka_wordcount.py" under the folder: /usr/local/spark/examples/src/main/python/streaming
The code explicitly describes the instruction to execute the code as:
" $ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test
test is the topic name. But I cannot find the jar and the path:
" external/kafka-assembly/target/scala-/spark-streaming-kafka-assembly-.jar"
So instead I created a folder "streaming/jar/" and put all jars from the website http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22 and then when I run
"park-submit --jars ~/stream-example/jars/spark-streaming-kafka-assembly_*.jar kafka_wordcount.py localhost:2181 topic"
which shows
"Error: No main class set in JAR; please specify one with --class Run with --help for usage help or --verbose for debug output"
What is wrong with that? Where are jars?
A ton of Thanks!!
Upvotes: 3
Views: 1287
Reputation: 323
This question was asked long ago, so I assume you have figured out by now. But, as I just had the same problem, I will post the solution that worked for me.
The deployment section of this guide (http://spark.apache.org/docs/latest/streaming-kafka-integration.html) says you can pass the lib with the --packages
argument, like bellow:
bin/spark-submit \
--packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test
You can also download the jar itself here: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22
Note: I didn't ran the command above, I tested with this other example, but it should work the same way:
bin/spark-submit
--packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \
examples/src/main/python/streaming/direct_kafka_wordcount.py \
localhost:9092 test
Upvotes: 2