Scala Unit test: how to validate the returned RDD

Question

I have written a method to filter out duplicates from an RDD and decided to write a unit test for the method. Here is my method:

  def filterDupes(salesWithDupes: RDD[((String, String), SalesData)]): RDD[((String, String), SalesData)] = {
    salesWithDupes.map(salesWithDupes => ((salesWithDupes._2.saleType, salesWithDupes._2.saleDate), salesWithDupes))
      .reduceByKey((a, _) => a)
      .map(_._2)
  }

Since this is my first experience writing a test in Scala I've faced several complexities. Am I correctly passing elements from the list to the filtering method?

Now I'm stuck with how to validate the result that is returned from the method. The only approach I came up with for now is collecting the RDD 's data to a list and then checking its size. Is it the right way?

Here is how I see the logic of the test:

"Sales" should "be filtered" in {

    Given("Sales RDD")

    val rddWithDupes = sc.parallelize(Seq(
  (("metric1", "metric2"), createSale("1", saleType = "Type1", saleDate = "2014-10-12")),
  (("metric1", "metric2"), createSale("2", saleType = "Type1", saleDate = "2014-10-12")),
  (("metric1", "metric2"), createSale("3", saleType = "Type3", saleDate = "2010-11-01"))
))

    When("Sales RDD is filtered")

    val filteredResult = SalesProcessor.filterDupes(rddWithDupes).collect.toList

    Then("Sales are filtered")
    filteredResult.size should be(2)
    ????
  }

Scala Unit test: how to validate the returned RDD

Answers (1)

Related Questions