Reputation: 65
I tried writing a simple rest controller in springboot that accepts a file and a string and searches in that file how many time the string is found, basically a word search and count.
I have a master and runner running on my pc and the springboot application connects without any problems. I tried using without a 'dedicated' spark master and it worked ok but after i made it connect to my desired problem the problem started occurring.
I also tried using lambda expressions but that got me other problems so i tried to make it easier.
Spark config:
@Configuration
public class SparkConfig {
@Value("${spark.app.name}")
private String appname;
@Value("${spark.master}")
private String masterUri;
@Bean
public SparkConf conf(){
System.out.println(appname + " " + masterUri);
return new SparkConf()
.setAppName(appname)
.setMaster(masterUri);
}
@Bean
public JavaSparkContext sc(){
return new JavaSparkContext(conf());
}
Endpoint:
@RequestMapping(value = "/file-word-count",method = RequestMethod.POST)
public String fileWordCount(@RequestParam("file") MultipartFile file, @RequestParam String word) {
return wordCountService.countFileWords(file,word);
}
service
@Service
@Component
public class WordCountService implements Serializable {
@Autowired
JavaSparkContext sc;
public String countFileWords(MultipartFile file, String word) {
String result = null;
try {
JavaRDD<String> textFile = sc.textFile(convertMultiToFile(file).getPath());
JavaRDD<String> words = textFile.filter(new Function<String, Boolean>() {
@Override
public Boolean call(String s) throws Exception {
return s.contains(word);
}
});
result = String.valueOf(words.count());
}
catch(IOException e){
System.out.println(e.getMessage());
}
return result;
}
public File convertMultiToFile(MultipartFile mFile) throws IOException {
File file = new File("temp","tempTextFile");
FileUtils.writeByteArrayToFile(file,mFile.getBytes());
return file;
}
}
console output:
2019-06-25 22:11:08.462 ERROR 5960 --- [nio-9090-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.apache.spark.SparkException: Task not serializable] with root cause
java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: org.apache.spark.api.java.JavaSparkContext@47c997cb)
- field (class: com.licenta.service.WordCountService, name: sc, type: class org.apache.spark.api.java.JavaSparkContext)
- object (class com.licenta.service.WordCountService, com.licenta.service.WordCountService@4152741c)
- field (class: com.licenta.service.WordCountService$1, name: this$0, type: class com.licenta.service.WordCountService)
- object (class com.licenta.service.WordCountService$1, com.licenta.service.WordCountService$1@22ae6915)
- field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, name: f$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.SparkContext.clean(SparkContext.scala:2326) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:388) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.rdd.RDD.filter(RDD.scala:387) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78) ~[spark-core_2.11-2.4.3.jar:2.4.3]
at com.licenta.service.WordCountService.countFileWords(WordCountService.java:35) ~[classes/:na]
at com.licenta.controller.CountController.fileWordCount(CountController.java:29) ~[classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_161]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_161]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_161]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_161]
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:660) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882) ~[spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) ~[tomcat-embed-websocket-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) ~[spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:200) ~[tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415) [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.17.jar:9.0.17]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.17.jar:9.0.17]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
2019-06-25 22:41:00.704 INFO 5960 --- [er-event-loop-8] o.apache.spark.storage.BlockManagerInfo : Removed broadcast_0_piece0 on DESKTOP-TF9C4EG:55462 in memory (size: 20.4 KB, free: 898.5 MB)
I expect to get a string that has the number of the word occurrences. Hope that i was clear in explaining, if not do tell and ill try to answer as best i can.
Upvotes: 1
Views: 2182
Reputation: 1029
Every Spark node should load classes which are required for execution of your logic. In your case it is class with countFileWords
method
To fix your problem you have to do the following steps:
countFileWords
function in separate module(you need jar file with class where countFileWords
is implemented)WordCountService
is implementedpublic void addJar(String path)
public void addJar(String path) Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
Upvotes: 0
Reputation: 5558
The problem is the new Function<String, Boolean>()
, it is an anonymous class and has a reference to WordCountService
and transitive to JavaSparkContext
. To avoid that you can make it a static nested class.
static class WordCounter implements Function<String, Boolean>, Serializable {
private final String word;
public WordCounter(String word){
this.word = word;
}
@Override
public Boolean call(String s) throws Exception {
return s.contains(word);
}
}
and use it with
JavaRDD<String> words = textFile.filter(new WordCounter(word));
Upvotes: 2