learning_spark
learning_spark

Reputation: 669

Scala technique to organize the code

I am learning Scala. Right now, I am re-organizing a program of mine and need help.

Suppose my program has two parts. The output of the first part is used as the input of the second part. The following data structures are created by the first part: Two arrays, two matrices, one double etc.

The second part of the program uses the above data structures (and also uses some extra file(s)) and finally writes the output to one/more files.

What is the best way to organize the program? How do I keep everything in memory, but still “pass” the data structures from the first part to the second part? I do not want to write the output of the first part to files and read them again.

Thanks and regards,

Upvotes: 0

Views: 167

Answers (2)

AmigoNico
AmigoNico

Reputation: 6862

One way to do it would be to pass the info in a tuple:

object MyApp extends App {
  def part1 {
    // do lots of work, perhaps using 'args' off the command line
    (array1, array2, matrix1, matrix2, dbl, ...)
  }
  def part2(p1: (<types for all of those things>)) {
    // do more work
  }
  part2(part1)
}

Or of course you could create a case class to hold the info; part1 would create an instance of the case class and part2 would take an instance as an argument:

object MyApp extends App {
  case class Part1Result(array1: ..., array2: ..., ...)
  def part1 {
    // do lots of work, perhaps using 'args' off the command line
    Part1Result(array1, array2, matrix1, matrix2, dbl, ...)
  }
  def part2(result: Part1Result) {
    // do more work
  }
  part2(part1)
}

Or of course part1 could explicitly call part2 with multiple parameters.

Or you could capture the results from part1 in globals:

object MyApp extends App {
  def part1 {
    // do lots of work, perhaps using 'args' off the command line
    (array1, array2, matrix1, matrix2, dbl, ...)
  }
  val (array1, array2, matrix1, matrix2, dbl, ...) = part1
  def part2 {
    // do more work, accessing the globals
  }
  part2
}

Of course, if you just want to write the code inline, then there's no reason for the defs:

object MyApp extends App {
  val (array1, array2, matrix1, matrix2, dbl, ...) = {
    // do lots of work, perhaps using 'args' off the command line
    (array1, array2, matrix1, matrix2, dbl, ...)
  }
  // do more work, accessing the globals
}

But I wouldn't recommend the use of globals to hold the results, because accessing globals will make it hard to write unit tests for the second part.

In all likelihood the two parts correspond to classes defined elsewhere, so perhaps what you want is something like

object MyApp extends App {
  new Part2(new Part1(<args off the command line>).result)
}
class Part1 {
  ...
  def result = (array1, array2, ...)
}
class Part2(p1Result: (...)) {
}

where Part1#result returns either a tuple or a case class.

In fact, rather than call a result method on your Part1 instance and get a result object back, the Part1 object itself could have accessors for the results:

object MyApp extends App {
  new Part2(new Part1(<args off the command line>))
}
class Part1 {
  val array1 = ...
  val array2 = ...
  ...
}
class Part2(p1: Part1) {
  // lots of work, referencing p1.array1, etc.
}

As you can see you have lots of options!

You will surely want your parts to be independently testable: you will test that given particular sets of inputs, they do the right thing. For that reason you will not want to have the second part call the first part directly, e.g.

def part2 = {
  val p1Results = part1  // oops, don't do this!
  ...
}

because that would mean that the only way you could test part2 with a particular set of inputs is to figure out how to get part1 to produce those inputs as outputs. That's a bummer -- what you want is for each part to take its inputs as arguments, so that in a unit test you can just pass in whatever you want. If the first part returns a simple data object and the second part takes such an object as an argument, then unit testing is easy. You could still do unit testing if a Part1 instance is passed to a Part2 instance as an argument, but you'd have to define the argument to Part2 in terms of a trait that Part1 implements, so that you could provide a different implementation in testing.

Probably the easiest way to create a readily testable solution is to have Part1 produce an instance of a case class that gets fed to Part2 as an argument (the case class solution mentioned earlier).

Upvotes: 1

GKV
GKV

Reputation: 892

best way of doing this is using case classes.

object HeavyLifting 
{
  def part1 {
    // do the heavy lifting
    CaseClassExample(array1, array2, matrix1, matrix2, dbl, ...)
  }

  def part2 {
    val getData= part1    // here u r getting an object of case class

    getData.array1       // and this is how u can fetch individual array, matrix etc
    getData.matrix1
  }
}
case class CaseClasssExample(array:Array[Int], ..... )   // here u can define ur case class, what all kind of data u want to store in it.


//case classes will be helpful in mattern matching

Upvotes: 0

Related Questions