user3655116
user3655116

Reputation: 33

how to use the spark cluster computing function in servlets

I am working on a dynamic web project. I want to write a servlet class to response to a frame submit request and perform some cluster computing tasks using apache spark(for example, calculating pi). The doGet function of the servlet(named Hello) is as following

public void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {
    String [] args=new String[2];
    args[0]="local";
    args[1]="4";
    double count=0;
    count=performSpark.cpi(args);
    //double count=3.14;
    String text1 =String.valueOf(count);
    response.sendRedirect("wresultjsp.jsp?text1=" + text1);  
}

The performSpark class is as following:

public class performSpark {
    static double cpi(String[] input)
    {
        JavaSparkContext jsc = new JavaSparkContext(input[0], "performspark",
        System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(performSpark.class));

        int slices = (input.length == 2) ? Integer.parseInt(input[1]) : 2;
        int n = 1000000 * slices;
        List<Integer> l = new ArrayList<Integer>(n);
        for (int i = 0; i < n; i++) 
        {
            l.add(1);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(l);

        int count = dataSet.map(new Function<Integer, Integer>() {
            @Override
            public Integer call(Integer integer) {
                double x = Math.random() * 2 - 1;             
                double y = Math.random() * 2 - 1;
                return (x * x + y * y < 1) ? 1 : 0;
            }
        }).reduce(new Function2<Integer, Integer, Integer>() {
            @Override
            public Integer call(Integer integer, Integer integer2) {
                return integer + integer2;
            }
        });

        double result=4.0 * count / n;      
        return result;
    }
}

The spark-assemply-2.10-0.9.1-hadoop2.2.0.jar is copied to WEB-INF/lib. The build is successful but when I run the servlet in a tomcat7 server,the java.lang.ClassNotFoundException is reported when create the JavaSparkContext:

Servlet.service() for servlet [Hello] in context with path [/sparkdemo] threw exception [Servlet execution threw an exception] with root cause java.lang.ClassNotFoundException: org.apache.spark.api.java.function.Function at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) at Hello.doGet(Hello.java:54) at Hello.doPost(Hello.java:74) at javax.servlet.http.HttpServlet.service(HttpServlet.java:646) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)

Any one knows how to correct this problem?

Upvotes: 2

Views: 2396

Answers (2)

Pushkin
Pushkin

Reputation: 534

We had a similar use case in our project where in we want to submit user queries to spark interactively from a web project. The way we achieved it was by first creating a spark session and attaching our custom servlet it to: .attachHandler()

In attachHandler method of our custom servlet we attached our Servlet class to ServletContextHandler of spark:

ServletContextHandler handler = new ServletContextHandler();
HttpServlet servlet = new <CustomServlet>(spark);
ServletHolder sh = new ServletHolder(servlet);
handler.setContextPath(<root context>);
handler.addServlet(sh, <path>);
spark.sparkContext().ui().get().attachHandler(handler);

Now that the servlet has got attached to Spark UI say on port 4040, then u can submit requests to it directly. We overrode the doGet method of our servlet to accept a JSON containing SQL to be run, submitted SQL using
ds = this.spark.sql(query); Iterated over dataset returned, and added it to response object.

Another way to do this is to leverage Apache Livy.

Hope this helps.

Upvotes: 0

user3655116
user3655116

Reputation: 33

Finally, I've found the solution as follows.

When tomcat server starts, it loads the spark-assemply-2.10-0.9.1-hadoop2.2.0.jar and an error:validateJarFile (.....) - jar not loaded. See Servlet Spec3.0 ...... is reported, which indicates that there exists some overlapped jar dependency.

Then I open the spark-assemply-2.10-0.9.1-hadoop2.2.0.jar and find an overlapped folder in javax/servlet. After delete the servlet folder, the spark-assemply-2.10-0.9.1-hadoop2.2.0.jar is loaded successfully in tomcat and the ClassNotFoundException is gone.

Upvotes: 1

Related Questions