Scala: Casting results of groupBy(_.getClass)

Question

In this hypothetical, I have a list of operations to be executed. Some of the operations in that list will be more efficient if they can be batched together (eg, lookup up different rows from the same table in a database).

trait Result
trait BatchableOp[T <: BatchableOp[T]] {
  def resolve(batch: Vector[T]): Vector[Result]
}

Here we use F-bounded Polymorphism to allow the implementation of the operation to refer to its own type, which is highly convenient.

However, this poses a problem when it comes time to execute:

def execute(operations: Vector[BatchableOp[_]]): Vector[Result] = {
  def helper[T <: BatchableOp[T]](clazz: Class[T], batch: Vector[T]): Vector[Result] =
    batch.head.resolve(batch)

  operations
    .groupBy(_.getClass)
    .toVector
    .flatMap { case (clazz, batch) => helper(clazz, batch)}
}

This results in a compiler error stating inferred type arguments [BatchableOp[_]] do not conform to method helper's type parameter bounds [T <: BatchableOp[T]].

How can the Scala compiler be convinced that the group is all of the same type (which is a subclass of BatchableOp)?

One workaround is to specify the type explicitly, but in this case the type is unknown.
Another workaround relies on enumerating the child types, but I'd like to not have to update the execute method after implementing a new BatchableOp type.

Andrey Tyukin · Accepted Answer

I would like to approach the question systematically, so that the same solution strategy can be applied in similar cases.

First, an obvious remark: you want to work with a vector. The content of the vector can be of different types. The length of the vector is not limited. The number of types of entries of the vector is not limited. Therefore, the compiler cannot prove everything at compile time: you will have to use something like asInstanceOf at some point.

Now to the solution of the actual question:

This here compiles under 2.12.4:

import scala.language.existentials

trait Result

type BOX = BatchableOp[X] forSome { type X <: BatchableOp[X] }

trait BatchableOp[C <: BatchableOp[C]] {
  def resolve(batch: Vector[C]): Vector[Result]

  // not abstract, needed only once!
  def collectSameClassInstances(batch: Vector[BOX]): Vector[C] = {
    for (b <- batch if this.getClass.isAssignableFrom(b.getClass))
    yield b.asInstanceOf[C]
  }

  // not abstract either, no additional hassle for subclasses!
  def collectAndResolve(batch: Vector[BOX]): Vector[Result] = 
    resolve(collectSameClassInstances(batch))
}

def execute(operations: Vector[BOX]): Vector[Result] = {

  operations
    .groupBy(_.getClass)
    .toVector
    .flatMap{ case (_, batch) =>
      batch.head.collectAndResolve(batch)
    }
}

The main problem that I see here is that in Scala (unlike in some experimental dependently typed languages) there is no simple way to write down complex computations "under the assumption of existence of a type". Therefore, it seems difficult / impossible to transform

Vector[BatchOp[T] forSome T]

into a

Vector[BatchOp[T]] forSome T

Here, the first type says: "it's a vector of batchOps, their types are unknown, and can be all different", whereas the second type says: "it's a vector of batchOps of unknown type T, but at least we know that they are all the same".

What you want is something like the following hypothetical language construct:

val vec1: Vector[BatchOp[T] forSome T] = ???
val vec2: Vector[BatchOp[T]] forSome T = 
  assumingExistsSomeType[C <: BatchOp[C]] yield {
    /* `C` now available inside this scope `S` */
    vec1.map(_.asInstanceOf[C])
  }

Unfortunately, we don't have anything like it for existential types, we can't introduce a helper type C in some scope S such that when C is eliminated, we are left with an existential (at least I don't see a general way to do it).

Therefore, the only interesting question that is to be answered here is:

Given a Vector[BatchOp[X] forSome X] for which I know that there is one common type C such that they all are actually Vector[C], where is the scope in which this C is present as a usable type variable?

It turns out that BatchableOp[C] itself has a type variable C in scope. Therefore, I can add a method collectSameClassInstances to BachableOp[C], and this method will actually have some type C available that it can use in the return type. Then I can immediately pass the result of collectSameClassInstances to the resolve method, and then I get a completely benign Vector[Result] type as output.

Final remark: If you decide to write any code with F-bounded polymorphisms and existentials, at least make sure that you have documented very clearly what exactly you are doing there, and how you will ensure that this combination does not escape in any other parts of the codebase. It doesn't feel like a good idea to expose such interfaces to the users. Keep it localized, make sure these abstractions do not leak anywhere.

Scala: Casting results of groupBy(_.getClass)

Answers (2)

Related Questions