Reputation: 23
I have a kinda similar issue of this issue (In Scala, is it possible to “curry” type parameters of a def?) , but I don't know how to resolve it using the given solution.
As you can see below, my current implementation does not allow the inference of the type parameter (the generic type U needs to be provided).
trait Block[U] {
def map(df: DataFrame, params: U): DataFrame
}
case class ParseURL() extends Block[(String, Column)] {
override def map(df: DataFrame, params: (String, Column)): DataFrame
}
class Pipeline(df: Dataframe) {
...
def copy(newDf: DataFrame) = new Pipeline(newDf)
...
def map[T <: Block[U] : ClassTag, U](d: U): Pipeline = {
val block: T = implicitly[ClassTag[T]].runtimeClass.newInstance.asInstanceOf[T]
this.copy(block.map(df, d))
}
...
}
Here is my current use of this implementation:
val pipeline = new Pipeline(df).map[ParseURL, (String, Column)]("url", $"url")
But I would like to use the map method such as:
val pipeline = new Pipeline(df).map[ParseURL]("url", $"url")
I think it might be possible with an anonymous class but any help would be appreciated :)
EDIT: Also, I don't know if this article should inspire me.
Upvotes: 1
Views: 296
Reputation: 23
Actually, my first implementation was to create a block registry that could be re-used in my pipeline class. But as you can see, the solution was not perfect to me cause I have to register my block explicitly. And I prefer to avoid redundancy.
trait Block {
type Parameters
// WARNING: This function is used only by pipeline and cast only the block parameters to avoid any cast in
// implementations
def mapDf[T <: Block : ClassTag](df: DataFrame, params: Any): DataFrame = {
this.map[T](df, params.asInstanceOf[Parameters])
}
// Abstract function that processes a dataframe
def map[T <: Block : ClassTag](df: DataFrame, params: Parameters): DataFrame
}
case class ParseURL() extends Block {
override type Parameters = (String, Column)
override def map[T <: Block : ClassTag](df: DataFrame, params: Parameters): DataFrame = {...}
}
class Pipeline(df: Dataframe) {
...
def copy(newDf: DataFrame) = new Pipeline(newDf)
...
def map[T <: Block : ClassTag](d: T#Parameters): Pipeline = {
this.copy(registry.lookupRegistry[T].mapDf(df, d))
}
...
}
case class NoSuchBlockException(declaredBlock: Class[_])
extends Exception(s"No block registered $declaredBlock in current registry")
class BlockRegistry {
var registry: Map[ClassTag[_ <: Block], _ <: Block] = Map()
def register[T <: Block : ClassTag](block: Block) = {
registry += (classTag[T] -> block)
this
}
def lookupRegistry[T <: Block : ClassTag]: Block = registry.get(classTag[T]) match {
case Some(block) => block
case _ => throw NoSuchBlockException(classTag[T].runtimeClass)
}
}
object BlockRegistry {
val registry: BlockRegistry = new BlockRegistry()
.register[ParseURL](ParseURL())
.register[CastColumn](CastColumn())
}
val pipeline = new Pipeline(df).map[ParseURL]("url", $"url")
Maybe replacing the block from trait to abstract class would help me to pass an implicit registry and to let the block to be registered itself (at instanciation). But the mechanism would be too complex once again.
Upvotes: 0
Reputation: 23788
I don't think you can easily apply the solution in the referenced question because you have dependency between your types T
and U
and it goes in the bad direction: T
depends on U
and you want to omit U
.
Here is another option that might help you. It is based on the idea to replace implicitly
call with an explicit parameter that will provide type information for the compiler. The idea is to introduce BlockFactory
trait such as following:
trait Block[U] {
def map(df: DataFrame, params: U): DataFrame
}
trait BlockFactory[T <: Block[U], U] {
def create(): T
}
class ParseURL extends Block[(String, Column)] {
override def map(df: DataFrame, params: (String, Column)): DataFrame = ???
}
object ParseURL extends BlockFactory[ParseURL, (String, Column)] {
override def create(): ParseURL = new ParseURL
}
class Pipeline(df: DataFrame) {
// ...
def copy(newDf: DataFrame) = new Pipeline(newDf)
// ...
def map[T <: Block[U] : ClassTag, U](blockFactory: BlockFactory[T, U], d: U): Pipeline = {
val block: T = blockFactory.create()
this.copy(block.map(df, d))
}
// ...
}
So you can use it as
val pipeline = new Pipeline(df).map(ParseURL, ("url", $"url"))
This idea should work OK if your typical Block
implementation is actually non-generic as it is for ParseURL
. If you have some generic Block
implementation, then usage would look not so nice:
class GenericBlock[U] extends Block[U] {
override def map(df: DataFrame, params: U): DataFrame = ???
}
class GenericBlockFactory[U] extends BlockFactory[GenericBlock[U], U] {
override def create(): GenericBlock[U] = ???
}
object GenericBlockFactory {
def apply[U](): GenericBlockFactory[U] = new GenericBlockFactory[U]
}
val pipelineGen = new Pipeline(df).map(GenericBlockFactory[(String, Column)](), ("url", $"url"))
You can improve it a bit by reversing the order of the arguments of map
and then currying it such as
class Pipeline(df: DataFrame) {
def map[T <: Block[U] : ClassTag, U](d: U)(blockFactory: BlockFactory[T, U]): Pipeline =
}
val pipelineGen = new Pipeline(df).map(("url", $"url"))(GenericBlockFactory())
In such way you don't have to specify generic types for GenericBlockFactory
still have to write ()
to call its apply
. This way it feels less natural to my but you save some typing.
Upvotes: 1
Reputation: 37832
There's a way to get something similar to what you're looking for, but it might be a bit clumsy and confusing for the reader, as it makes the eventual call to map
look like this: .map(("url", col("url")))[ParseURL]
.
The idea here is to create an intermediate class returned from map
(called Mapper
here) that conserves the U
type information, and then has a parameterless apply
method taking in the T
type argument:
class Pipeline(df: DataFrame) { self =>
def copy(newDf: DataFrame) = new Pipeline(newDf)
final class Mapper[U](d: U) {
def apply[T <: Block[U] : ClassTag]: Pipeline = {
val block: T = implicitly[ClassTag[T]].runtimeClass.newInstance.asInstanceOf[T]
self.copy(block.map(df, d))
}
}
def map[U](d: U): Mapper[U] = new Mapper(d)
}
val pipeline = new Pipeline(df).map(("url", col("url")))[ParseURL]
It does look weird, so take it or leave it :)
A slight alternative would be to rename apply
to something else, say using
, which would end up longer but perhaps clearer:
val pipeline = new Pipeline(df).map(("url", col("url"))).using[ParseURL]
Upvotes: 1