Reputation: 6925
I am trying to run my Spark job on my Spark cluster that I have created using the Spark-ec2 script they provide. I am able to run the SparkPi example but whenever I am running my job I keep getting this exception:
Exception in thread "main" java.io.IOException: Call to ec2-XXXXXXXXXX.compute-1.amazonaws.com/10.XXX.YYY.ZZZZ:9000 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy6.setPermission(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy6.setPermission(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1042)
at org.apache.hadoop.hdfs.DistributedFileSystem.setPermission(DistributedFileSystem.java:531)
at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:93)
at org.apache.spark.util.FileLogger.start(FileLogger.scala:70)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:71)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:252)
at com.here.traffic.collection.archiver.IsoCcMergeJob$.isoMerge(IsoCcMergeJob.scala:55)
at com.here.traffic.collection.archiver.IsoCcMergeJob$.main(IsoCcMergeJob.scala:11)
at com.here.traffic.collection.archiver.IsoCcMergeJob.main(IsoCcMergeJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)
From what I read looking for a solution on internet it looks like it could a mismatch in the Hadoop lib version but I verified that Spark is using 1.0.4 and my job was compiled with the same version.
To give some more context my job is doing a left outer join of two files that live in S3 and putting the result in S3 again.
Any ideas what could be wrong?
Upvotes: 1
Views: 567
Reputation: 31553
I had similar experiences using the ec2 scripts, nearly all version problems went away once we used the cloudera distros (5.1) for both the cluster (via a nice simple apt-get) and for the jar dependency.
Installing spark: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_spark_installation.html
adding spark as a dependency (search for text “spark”):
Upvotes: 1