Reputation: 29
I have a spark scala program which loads a jar I wrote in java. From that jar a static function is called, which tried to read a serialized object from a file (Pattern.class
), but throws a java.lang.ClassNotFoundException
.
Running the spark program locally works, but on the cluster workers it doesn't. It's especially weird because before I try to read from the file, I instantiate a Pattern
object and there are no problems.
I am sure that the Pattern
objects I wrote in the file are the same as the Pattern
objects I am trying to read.
I've checked the jar in the slave machine and the Pattern
class is there.
Does anyone have any idea what the problem might be ? I can add more detail if it's needed.
This is the Pattern class
public class Pattern implements Serializable {
private static final long serialVersionUID = 588249593084959064L;
public static enum RelationPatternType {NONE, LEFT, RIGHT, BOTH};
RelationPatternType type;
String entity;
String pattern;
List<Token> tokens;
Relation relation = null;
public Pattern(RelationPatternType type, String entity, List<Token> tokens, Relation relation) {
this.type = type;
this.entity = entity;
this.tokens = tokens;
this.relation = relation;
if (this.tokens != null)
this.pattern = StringUtils.join(" ", this.tokens.toString());
}
}
I am reading the file from S3 the following way:
AmazonS3 s3Client = new AmazonS3Client(credentials);
S3Object confidentPatternsObject = s3Client.getObject(new GetObjectRequest("xxx","confidentPatterns"));
objectData = confidentPatternsObject.getObjectContent();
ois = new ObjectInputStream(objectData);
confidentPatterns = (Map<Pattern, Tuple2<Integer, Integer>>) ois.readObject();
LE: I checked the classpath at runtime and the path to the jar was not there. I added it for the executors but I still have the same problem. I don't think that was it, as I have the Pattern class inside the jar that is calling the readObject function.
Upvotes: 0
Views: 1656
Reputation: 29185
Would suggest this adding this kind method to find out the classpath resources before call, to make sure that everything is fine from caller's point of view
public static void printClassPathResources() {
final ClassLoader cl = ClassLoader.getSystemClassLoader();
final URL[] urls = ((URLClassLoader) cl).getURLs();
LOG.info("Print All Class path resources under currently running class");
for (final URL url : urls) {
LOG.info(url.getFile());
}
}
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \
--conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
val conf = new SparkConf().setAppName(appName).setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))
This should fix the vast majority of class not found problems. Another option is to place your dependencies on the default classpath on all of the worker nodes in the cluster. This way you won’t have to pass around a large jar.
The only other major issue with class not found issues stems from different versions of the libraries in use. For example if you don’t use identical versions of the common libraries in your application and in the spark server you will end up with classpath issues. This can occur when you compile against one version of a library (like Spark 1.1.0) and then attempt to run against a cluster with a different or out of date version (like Spark 0.9.2). Make sure that you are matching your library versions to whatever is being loaded onto executor classpaths. A common example of this would be compiling against an alpha build of the Spark Cassandra Connector then attempting to run using classpath references to an older version.
Upvotes: 0