Santosh Bobade
Santosh Bobade

Reputation: 39

Scala MapReduce Filter

Is there any way of doing the following in Scala?

Say I have an array of Double of size 15:

[10,20,30,40,50,60,70,80,Double.NaN,Double.NaN,110,120,130,140,150]

I would like to replace all the Double.NaN (from left to right) by the average of the last four values in the array using map reduce. So the first Double.NaN gets replaced by 60, and the next Double.NaN is replaced by 64 (i.e., the previously calculated 60 at the index 8 is used in this calculation).

So far I have used function type parameters to get the positions of the Double.NaN.

Upvotes: 0

Views: 319

Answers (2)

Sledge
Sledge

Reputation: 178

Note that averaging the last four values of the first NaN from the given example (50,60,70,80) gives 65, not 60. The last five will give 60.

Does it have to be a map-reduce? How about a fold?

(List[Double]() /: listOfDoubles)((acc: List[Double], double: Double) => {(if (double.isNaN)
  acc match {
    case Nil => 0.0 // first double in the list
    case _ => {
      val last5 = acc.take(5)
      (0.0 /: last5)(_ + _) / last5.size // in case there's only a last 1, 2, 3, or 4 instead of 5
      }
    }
else double) :: acc}).reverse

Upvotes: 0

Andrey Tyukin
Andrey Tyukin

Reputation: 44967

I'm not sure what exactly you mean by "map-reduce" in this case. It looks rather like a use-case for scanLeft:

import scala.collection.immutable.Queue
val input = List[Double](
  10,20,30,40,50,60,70,80,Double.NaN,
  Double.NaN,110,120,130,140,150
)
val patched = input.
  scanLeft((Queue.fill(5)(0d), 0d)){ 
    case ((q, _), x) => { 
      val y = if (x.isNaN) q.sum / 5 else x; 
      (q.dequeue._2.enqueue(y), y)
    }
  }.unzip._2.tail

Creates result:

List(10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 60.0, 64.0, 110.0, 120.0, 130.0, 140.0, 150.0)

In general, unless the gaps are "rare", this would not work with a typical map-reduce workflow, because

  • Every value in the resulting list can depend on arbitrary many values to the left of it, therefore you cannot cut the dataset up in independent blocks and map them independently.
  • You are not reducing anything, you want a patched list back

If you are not mapping and not reducing, I wouldn't call it "map-reduce".

By the way: the above code works for any (positive integer) value of "5".

Upvotes: 1

Related Questions