Reputation: 77
I'm trying to connect to aws redis cluster from an emr cluster, I uploaded the jar driver to s3 and used this bootstrap action to copy the jar file to the cluster nodes:
aws s3 cp s3://sparkbcuket/spark-redis-2.3.0.jar /home/hadoop/spark-redis-2.3.0.jar
This is my connection test spark app:
import sys
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession.builder\
.config("spark.redis.host", "testredis-0013.vb4vgr.00341.eu1.cache.amazonaws.com")\
.config("spark.redis.port", "6379")\
.appName("Redis_test").getOrCreate()
df = spark.read.format("org.apache.spark.sql.redis").option("key.column", "key").option("keys.pattern","*").load()
df.write.csv(path='s3://sparkbucket/',sep=',')
spark.stop()
when runing the application using this spark-submit :
spark-submit --deploy-mode cluster --driver-class-path /home/hadoop/spark-redis-2.3.0.jar s3://sparkbucket/testredis.py
i get the following error and not sure what i did wrong:
ERROR Client: Application diagnostics message: User application exited with status 1 Exception in thread "main" org.apache.spark.SparkException: Application application_1658168513779_0001 finished with failed status
Upvotes: 1
Views: 1287
Reputation: 96
With similar test code, I successfully run by uploading the spark-redis jar in S3 and used --jars as arg as follows:
spark-submit --deploy-mode cluster --jars s3://<bucket/path>/spark-redis_2.12-3.1.0-SNAPSHOT-jar-with-dependencies.jar s3://<bucket/path>/redis_test.py
The detailed log for the run can be viewed in the Spark history server. This can be accessed in the EMR web console following this sequence of links:
Summary -> Spark history server -> application_xxx_xxx -> Executors -> (driver)stdout
You'll get NoSuchKey error as it will take some time for the log to be available, just reload.
Upvotes: 1