Vishaal Kalwani
Vishaal Kalwani

Reputation: 740

Understanding covariance in my Scala code

I am working through the correct syntax and structure for the following problem.

I have two datasets with two separate schemas--call them ClientEvent and ServerEvent--stored on disk. The codebase I am working on has defined a class, Reader[T :< Asset] where ClientEvent and ServerEvent are subtypes of Asset. Asset is a trait.

I am writing a function:

def getPathAndReader(config): (String, Reader[Asset]) = {
    if (config.readClient) {
        return getClientPathAndReader(config)
    } else {
        return getServerPathAndReader(config)
    } 
}

This does not compile in my Scala code. From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]. But since functions are covariant in their inputs, the function I wrote cannot just return this lower type, I'd have to cast it to a supertype? Does that lose too much information?

load is a function on the trait Asset

trait Reader[T <: Asset] {
  def load(raw: DataFrame): Dataset[T]
}

What would be an alternative way to structure this code?

The code's intent is to take the file path returned, and call Reader::load(filePath: String) to get data back. The subtyped readers have some internal logic to clean the data that it retrieves from disk before it's returned as a Dataframe. This means it relies on the type that it passes in. I come from a C++/C# background so my thinking is that if you have a generic Reader[Asset] but call Reader::load(path: String) it will know what to do based on the type it actually is, similar to Base* ptr and calling a derived method.

Upvotes: 1

Views: 249

Answers (1)

SergGr
SergGr

Reputation: 23788

Your claim that "From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]." is not correct. Generally if A and B are usual types such as A <: B and G[T] is a generic type, then all 3 cases are possible:

  • Co-variant case G[A] <: G[B] - typical example is some read-only collection like Iterator
  • Contra-variant case G[A] :> G[B] - typical example is some kind of a consumer like a function T => ()
  • Invariant case where G[A] and G[B] are not related. The most typical case when some uses of the T are co-variant and some a contravariant. For example, a simple mapping function T => T is invariant. Also most of the mutable collections are invariant as well because the both "produce" and "consume" objects.

Unfortunately for you Dataset[T] is invariant (rather than covariant Dataset[+T] or contravariant Dataset[-T]). This effectively makes your Reader also invariant. As to how to work this around, it is hard to advice without understanding a larger context. For example, why your getClientPathAndReader and getServerPathAndReader do not return Dataset[Asset]? If you really then use specific ServerEvent and ClientEvent, then your design is not type-safe anyway. If you use only Asset, then changing your readers to return Dataset[Asset] seems the easiest solution.

Upvotes: 2

Related Questions