Rupam Bhattacharjee
Rupam Bhattacharjee

Reputation: 389

How to take out Nothing out of inferred type

The idea comes from this video: https://www.youtube.com/watch?v=BfaBeT0pRe0&t=526s, where they talk about implementing type safety through implementation of custom types.

And a possible trivial implementation is

trait Col[Self] { self: Self =>
}

trait Id extends Col[Id]
object IdCol extends Id

trait Val extends Col[Val]
object ValCol extends Val

trait Comment extends Col[Comment]
object CommentCol extends Comment

case class DataSet[Schema >: Nothing](df: DataFrame) {

  def validate[T1 <: Col[T1], T2 <: Col[T2]](
      col1: (Col[T1], String),
      col2: (Col[T2], String)
  ): Option[DataSet[Schema with T1 with T2]] =
    if (df.columns
          .map(e => e.toLowerCase)
          .filter(
            e =>
              e.toLowerCase() == col1._2.toLowerCase || e
                .toLowerCase() == col2._2.toLowerCase
          )
          .length >= 1)
      Some(DataSet[Schema with T1 with T2](df))
    else None
}

object SchemaTypes extends App {

  lazy val spark: SparkSession = SparkSession
    .builder()
    .config(
      new SparkConf()
        .setAppName(
          getClass()
            .getName()
        )
    )
    .getOrCreate()

  import spark.implicits._

  val df = Seq(
    (1, "a", "first value"),
    (2, "b", "second value"),
    (3, "c", "third value")
  ).toDF("Id", "Val", "Comment")

  val myData =
    DataSet/*[Id with Val with Comment]*/(df)
      .validate(IdCol -> "Id", ValCol -> "Val")

  myData match {
    case None => throw new java.lang.Exception("Required columns missing")
    case _    =>
  }
}

The type for myData is Option[DataSet[Nothing with T1 with T2]]. It makes sense since the constructor is called w/o any type parameter, but in the video they show the type to be in line with DataSet[T1 with T2].

Of course, changing the invocation by passing explicity type takes Nothing out, but it is redundant to specify the type parameter value since the types are already included in the arg list.

val myData =
  DataSet[Id with Val with Comment](df).validate(IdCol -> "Id", ValCol -> "Val")

Upvotes: 1

Views: 192

Answers (2)

Alexey Romanov
Alexey Romanov

Reputation: 170835

Dmytro Mitin's answer is good, but I wanted to provide some more information.

If you write something like DataSet(df).validate(...), the type parameter of DataSet(df) is inferred first. Here it's Nothing because there's no information which would make it anything else. So the Schema is Nothing, and Schema with T1 with T2 (which appears in the return type of validate) is Nothing with Id with Val.

Upvotes: 2

Dmytro Mitin
Dmytro Mitin

Reputation: 51693

Types Id and Val can be inferred because there are IdCol and ValCol inside .validate. But type Comment can't be inferred. So try

val myData =
  DataSet[Comment](df)
    .validate(IdCol -> "Id", ValCol -> "Val")

println(shapeless.test.showType(SchemaTypes.myData)) 
//Option[App.DataSet[App.Comment with App.Id with App.Val]]

https://scastie.scala-lang.org/yj0HnpkyQfCreKq8ZV4D7A

Actually if you specify DataSet[Id with Val with Comment](df) the type will be Option[DataSet[Id with Val with Comment with Id with Val]], which is equal (=:=) to Option[DataSet[Id with Val with Comment]].


Ok, I watched the video till that time-code. I guess speakers tried to explain their idea (combining F-bounded polymorphism T <: Col[T] with intersection types T with U) and you shouldn't take their slides literally, there can be inaccuracies there.

Firstly they show slide

case class DataSet[Schema](df: DataFrame) {   
  def validate[T <: Col[T]](
    col: (Col[T], String)
  ): Option[DataSet[Schema with T]] = ??? 
}

and this code can be illustrated with

val myDF: DataFrame = ???
val myData = DataSet[VideoId](myDF).validate(Country -> "country_code")
myData : Option[DataSet[VideoId with Country]]

Then they show slide

val myData = DataSet(myDF).validate(
  VideoId -> "video_id",
  Country -> "country_code",
  ProfileId -> "profile_id",
  Score -> "score"
)

myData : DataSet[VideoId with Country with ProfileId with Score]

but this illustrating code doesn't correspond to the previous slide. You should define

// actually we don't use Schema here
case class DataSet[Schema](df: DataFrame) {
  def validate[T1 <: Col[T1], T2 <: Col[T2], T3 <: Col[T3], T4 <: Col[T4]](
    col1: (Col[T1], String),
    col2: (Col[T2], String),
    col3: (Col[T3], String),
    col4: (Col[T4], String),
  ): DataSet[T1 with T2 with T3 with T4] = ???
}

So take it as an idea, not literally.

You can have something similar with

case class DataSet[Schema](df: DataFrame) {
  def validate[T <: Col[T]](
    col: (Col[T], String)
  ): Option[DataSet[Schema with T]] = ???
}

val myDF: DataFrame = ???

val myData = DataSet[Any](myDF).validate(VideoId -> "video_id").flatMap(
  _.validate(Country -> "country_code")
).flatMap(
  _.validate(ProfileId -> "profile_id")
).flatMap(
  _.validate(Score -> "score")
)

myData: Option[DataSet[VideoId with Country with ProfileId with Score]]

Upvotes: 3

Related Questions