Reputation: 2995
I'm (very) new to spark, so apologies if this is a stupid question.
I am trying to execute the spark (2.2.0) python spark streaming example, however I keep running into the issue below:
Traceback (most recent call last):
File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/kinesis_wordcount_asl.py", line 76, in <module>
ssc, appName, streamName, endpointUrl, regionName, InitialPositionInStream.LATEST, 2)
File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 92, in createStream
File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o27.createStream. Trace:
py4j.Py4JException: Method createStream([class org.apache.spark.streaming.api.java.JavaStreamingContext, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.Integer, class org.apache.spark.streaming.Duration, class org.apache.spark.storage.StorageLevel, null, null, null, null, null]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
The tarball I downloaded from spark's website did not include the external folder in it (seems like there's some license issue), so this is the command I have been trying to execute (after downloading kinesis_wordcount_asl.py
from github)
bin/spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.2.0 kinesis_wordcount_asl.py sparkEnrichedDev relay-enriched-dev https://kinesis.us-west-2.amazonaws.com us-west-2
Happy to provide any additional details if needed.
Upvotes: 5
Views: 1032
Reputation: 353
I case somebody winds up here like I did, this is due to version mismatches. I was having the same problem and I managed to solve it by matching the corresponding versions to the kinesis package. Both numbers should match the Scala version used for compiling the libraries and the Spark version. I for example have the following:
$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_222
Branch HEAD
Compiled by user centos on 2020-02-02T19:38:06Z
Revision cee4ecbb16917fa85f02c635925e2687400aa56b
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.
This corresponds to Spark 2.4.5
compiled using Scala 2.11.12
. Therefore, the corresponding package should be
spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.5 kinesis_...
Upvotes: 0
Reputation: 35219
Based on the exception it looks like there is a version mismatch between core Spark / Spark streaming and spark-kinesis
. API changed between Spark 2.1 and 2.2 (SPARK-19405) and version mismatch would cause a similar error.
This makes me think you're submitting using incorrect binaries (just a guess) - it can be PATH
, PYTHONPATH
or SPARK_HOME
issue if you use local
mode. Because you get signature mismatch we can assume that spark-kinesis
is loaded correctly and org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
is present on the CLASSPATH
.
Upvotes: 3