user3248346
user3248346

Reputation:

Partitioning of a Stream

I'm not sure if this is possible but I want to partition a stream based on some condition that depends on the output of the stream. It will make sense with an example I think.

I will create a bunch of orders which I will stream since the actual use case is a stream of orders coming in so it is not known up front what the next order will be or even the full list of orders:

scala> case class Order(item : String, qty : Int, price : Double)
defined class Order

scala> val orders = List(Order("bike", 1, 23.34), Order("book", 3, 2.34), Order("lamp", 1, 9.44), Order("bike", 1, 23.34))
orders: List[Order] = List(Order(bike,1,23.34), Order(book,3,2.34), Order(lamp,1,9.44), Order(bike,1,23.34))

Now I want to partition/group these orders into one set which contain duplicate orders and another set which contains unique orders. So in the above example, when I force the stream it should create two streams: one with the two orders for a bike (Since they are the same) and another stream containing all the other orders.

I tried the following:

created the partitioning function:

scala> def matchOrders(o : Order, s : Stream[Order]) = s.contains(o)
matchOrders: (o: Order, s: Stream[Order])Boolean

then tried to apply this to stream:

scala> val s : (Stream[Order], Stream[Order]) = orders.toStream.partition(matchOrders(_, s._1))

I got a null pointer exception since I guess the s._1 is empty initially?? I'm not sure. I've tried other ways but I'm not getting very far. Is there a way to achieve this partitioning?

Upvotes: 2

Views: 159

Answers (2)

Alexey Romanov
Alexey Romanov

Reputation: 170735

Note that you can only know that an order has no duplicates after your stream finishes. So since the standard Stream constructors require you to know whether the stream is empty, it seems they aren't lazy enough: you have to force your original stream to even begin building the no-duplicates stream. And of course if you do this, Helder Pereira's answer applies.

Upvotes: 1

Helder Pereira
Helder Pereira

Reputation: 5756

That would not work anyway, because the first duplicate Order would have already gone to the unique Stream when you would process its duplicate.

The best way is to create a Map[Order, Boolean] which tells you if an Order appears more than once in the original orders list.

val matchOrders = orders.groupBy(identity).mapValues(_.size > 1)
val s : (Stream[Order], Stream[Order]) = orders.toStream.partition(matchOrders(_))

Upvotes: 2

Related Questions