iamabhaykmr
iamabhaykmr

Reputation: 2011

: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(S3;Ljava/util/concurrent/ThreadPoolExecutor;)V

I am trying to read redshift table data into red-shift data frame and writing that dataframe in another redshift table. Using following .jar in spark_submit for this task.

Here is the command:

spark-submit --jars RedshiftJDBC41-1.2.12.1017.jar,minimal-json-0.9.4.jar,spark-avro_2.11-3.0.0.jar,spark-redshift_2.10-2.0.0.jar,aws-java-sdk-sqs-1.11.694.jar,aws-java-sdk-s3-1.11.694.jar,aws-java-sdk-core-1.11.694.jar --packages org.apache.hadoop:hadoop-aws:2.7.4 t.py 

I tried changing the version of all the jar and hadoop-aws version as well accordingly as mentioned in various stackoverflow answer with no luck.

Traceback (most recent call last):
  File "/home/ubuntu/trell-ds-framework/data_engineering/data_migration/t.py", line 21, in <module>
    .option("tempdir", "s3a://AccessKey:AccessSecret@big-query-to-rs/rs_temp_data") \
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at com.databricks.spark.redshift.Utils$.assertThatFileSystemIsNotS3BlockFileSystem(Utils.scala:124)
    at com.databricks.spark.redshift.RedshiftRelation.<init>(RedshiftRelation.scala:52)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:49)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2019-12-21 14:38:03 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2019-12-21 14:38:03 INFO  AbstractConnector:318 - Stopped Spark@3c115b0a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-12-21 14:38:03 INFO  SparkUI:54 - Stopped Spark web UI at http://ip-172-30-1-193.ap-south-1.compute.internal:4040
2019-12-21 14:38:03 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!

Can any body help me out here about what could be the issue? Is it library issue of .jar or hadoop or something else?

Thanks.

Upvotes: 2

Views: 3165

Answers (4)

stevel
stevel

Reputation: 13440

as the s3a troubleshooting docs say

Do not attempt to “drop in” a newer version of the AWS SDK than that which the Hadoop version was built with Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see

stop using 8 year old hadoop 2.7 binaries, upgrade to a version of spark with 3.3.1 binaries and try again

Upvotes: 0

Greg
Greg

Reputation: 5588

For my case I was using the s3a connector and pyspark on SageMaker with Hadoop 2.7.3 jars. To fix it, I needed to install the following jars hadoop-aws-2.7.3.jar and aws-java-sdk-1.7.4.jar. This resolved my issue instead of using the documented aws-java-sdk-bundle jar.

My code:

builder = SparkSession.builder.appName('appName')
builder = builder.config('spark.jars', 'hadoop-aws-2.7.3.jar,aws-java-sdk-1.7.4.jar')
spark = builder.getOrCreate()

Upvotes: 1

WannaGetHigh
WannaGetHigh

Reputation: 3914

I had the same issue while using spark-shell.

The issue was that the jar was loaded at the end of all jars, I moved it at the beginning and the issue was resolved.

Upvotes: 0

Dheeraj
Dheeraj

Reputation: 342

I ran into a similar kind of error few days back.

I could able to resolve by upgrading the 'aws-java-sdk-s3' jar to the latest version.

Upvotes: 1

Related Questions