JoseM LM
JoseM LM

Reputation: 373

About Spark Applications Architecture: Long-Lived server

In this Post in Cloudera ( http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/) they say this:

An application can be used for a single batch job, an interactive session with multiple jobs spaced apart, or a long-lived server continually satisfying requests.

I'm interested in "Long-Live Server continually Satisfying Request": How can I configure Spark for working in that mode?, I 've wrote a very simple application that is listening in a Socket port and doing tasks when it receive a order, but I'm not sure about if this is the way the things must work. Any advice, post or book that throws some light in my path? :) Thank you!

My code is very simple and naive, but here is it:

// Before this line is the code in charge of reading the source files and creates the graph
val server = new ServerSocket(9999)
val s = server.accept()
val in = new BufferedSource(s.getInputStream()).getLines()
val out = new PrintStream(s.getOutputStream())

while (true) {
  var str = in.next()
  if (str =="filtro"){
    out.println("Starting Job. Please Wait")
    var a = in.next()
    graph.vertices.filter{
      case(id, (followers_count, lang)) =>  followers_count > 10000
    }.collect.foreach{
      case(id, (followers_count,lang)) => out.println(s"$screen_name has $followers_count")
    }
    out.println("Job Finished")
    out.flush()
  }
  if (str == "filtro2") {
    out.println("Starting Job. Please Wait")
    var a = in.next()
    graph.vertices.filter{
      case(id, (followers_count, lang)) =>  lang == "es"
    }.collect.foreach{
      case(id, (followers_count, lang)) => out.println(s"$screen_name has $followers_count")
    }
    out.println("Job Finished")
    out.flush()
  }
  out.println(in.next())
  out.flush()
 }
s.close()

As you can see, my prototype Scala script "is listening" and when it receive the expected "order", then the order is executed. I'm pretty sure that this must be done in another way, but I can't find how to do it.

Upvotes: 0

Views: 793

Answers (1)

DPM
DPM

Reputation: 1581

It sounds like you've implemented it in your very simple socket listener application, though it's hard to be sure without seeing any code.

In general, as long as your SparkContext is around, any RDDs associated with it can still be around, as well, so if you persist them they will be available for further use.

Later tasks can take advantage of the persisted RDDs to avoid redoing some work.

Upvotes: 0

Related Questions