Yoel Garcia
Yoel Garcia

Reputation: 173

write polymorphic function that accept IndexedSeq[A] as well as ParVector[A]?

I want to write a polymorphic function that accepts either an IndexedSeq[A] or a ParVector[A]. Inside the function I want access to the prepend method i.e. +: which is in SeqLike. SeqLike like is a rather confusing type for me since it take a Repr which I sort of ignored, unsuccessfully of course.

def goFoo[M[_] <: SeqLike[_,_], A](ac: M[A])(p: Int): M[A] = ???

The function should accept an empty accumulator to start with and call itself recursively p times and each time prepend an A. Here is a concrete example

def goStripper[M[_] <: SeqLike[_,_]](ac: M[PDFTextStripper])(p: Int): M[PDFTextStripper] = {
  val str = new PDFTextStripper
  str.setStartPage(p)
  str.setEndPage(p)
  if (p > 1) goStripper(str +: ac)(p-1)
  else str +: ac
}

But of course this doesn't compile because I am missing something fundamental about SeqLike. Does anyone has a solution (preferably with an explanation for this?)

Thanks.

Upvotes: 2

Views: 143

Answers (1)

isomarcte
isomarcte

Reputation: 1981

Dealing with SeqLike[A, Repr] can be a bit difficult sometimes. You really need to have a good understanding of how the collections library works (This is a great article if you are interested, http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html). Thankfully, in your case, you actually don't need to even worry about it too much. Both IndexedSeq[A] and ParVector[A] are subclasses of scala.collection.GenSeq[A]. So you can merely write your method as follows

Simple Solution

scala> def goFoo[A, B <: GenSeq[A] with GenSeqLike[A, B]](ac: B)(p: Int): B = ac
goFoo: [A, B <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B]](ac: B)(p: Int)B

scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1))(1)
res26: IndexedSeq[Int] = Vector(1)

scala> goFoo[Int, ParVector[Int]](new ParVector(Vector(1)))(1)
res27: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)

You need to enforce that B is both a subtype of GenSeq[A] and GenSeqLike[A, Repr] so that you can provide the correct value for the Repr. You also need to enforce that the Repr in GenSeqLike[A, Repr] is B. Otherwise some of the methods won't return the correct type. Repr is the underlying representation of the collection. To really understand it, you should read the article I linked, but you can think of it as the output type of the many of the collection operations, although that is very oversimplified. I talk about it more below, if you are really interested. For now, it suffices to say we want it to be the same type as the collection we are operating on.

Higher Kind Solution

Right now, the type system needs you to manually supply both generic parameters, which is fine, but we can do a little better. You can make this a little cleaner if you allow for higher kinds.

scala> import scala.language.higherKinds
import scala.language.higherKinds

scala> def goFoo[A, B[A] <: GenSeq[A] with GenSeqLike[A, B[A]]](ac: B[A])(p: Int): B[A] = ac
goFoo: [A, B[A] <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B[A]]](ac: B[A])(p: Int)B[A]

scala> goFoo(IndexedSeq(1))(1)
res28: IndexedSeq[Int] = Vector(1)

scala> goFoo(new ParVector(Vector(1)))(1)
res29: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)

Now you don't have to worry about manually supplying the types.

Recursion

These solutions work with recursion as well.

scala> @tailrec
     | def goFoo[A, B <: GenSeq[A] with GenSeqLike[A, B]](ac: B)(p: Int): B = 
     | if(p == 0){
     |   ac 
     | } else {
     |   goFoo[A, B](ac.drop(1))(p-1)
     | }
goFoo: [A, B <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B]](ac: B)(p: Int)B

scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1, 2))(1)
res30: IndexedSeq[Int] = Vector(2)

And the higher kinded version

scala> @tailrec
     | def goFoo[A, B[A] <: GenSeq[A] with GenSeqLike[A, B[A]]](ac: B[A])(p: Int): B[A] = 
     | if(p == 0){
     |   ac 
     | } else {
     |   goFoo(ac.drop(1))(p-1)
     | }
goFoo: [A, B[A] <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B[A]]](ac: B[A])(p: Int)B[A]

scala> goFoo(IndexedSeq(1, 2))(1)
res31: IndexedSeq[Int] = Vector(2)

Using GenSeqLike[A, Repr] Directly TL;DR

So I just want to say, unless you have a need for a more general solution don't do this. It is the hardest to understand and work with. We can't use SeqLike[A, Repr] because ParVector is not an instance of SeqLike, but we can use GenSeqLike[A, Repr], which both ParVector[A] and IndexedSeq[A] subclass.

That being said let's talk about how you could also solve this problem using GenSeqLike[A, Repr] directly.

Unpacking the type variables

First the easy one

A

This is just the type of the value in the collection, so for a Seq[Int] this would be Int.

Repr

This is the underlying type of the collection.

Scala collections implement most of their functionality in common traits, so that they don't have to duplicate code all over the place. Further, they wish to permit out of band types to function as though they are collections even if they don't inherit from a collections trait(I'm looking at you Array), and to allow client libraries/programs to add their own collection instances very easily while getting the bulk of the collection methods defined for free.

They are designed with two guiding constraints

Note: These examples are taken from the aforementioned article and are not my own.(linked again here for completeness http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html)

The first constraint can be shown in the following example. A BitSet is a set of non-negative integers. If I perform the following operation, what should the result be?

BitSet(1).map(_+1): ???

The correct answer was a BitSet. I know that seemed rather obvious, but consider the following. What is the type of this operation?

BitSet(1).map(_.toFloat): ???

It can't be a BitSet, right? Because we said that BitSet values are non-negative integers. So it turns out to be a SortedSet[Float].

The Repr parameter, combined with an appropriate CanBuildFrom instance(I explain what this is in a second) is one of the primary mechanisms that allows for the returning the most specific type possible. We can see this by sort of cheating the system on the REPL. Consider the following, Vector is both a subclass of IndexedSeq and Seq. So what if we do this...

scala> val x: GenSeqLike[Int, IndexedSeq[Int]] = Vector(1)
x: scala.collection.SeqLike[Int,IndexedSeq[Int]] = Vector(1)

scala> 1 +: x
res26: IndexedSeq[Int] = Vector(1, 1)

See how the final type here was IndexedSeq[Int]. This was because we told the type system that the underlying representation of the collection was IndexedSeq[Int] so it tries to return that type if possible. Now watch this,

scala> val x: GenSeqLike[Int, Seq[Int]] = Vector(1)
x: scala.collection.SeqLike[Int,Seq[Int]] = Vector(1)

scala> 1 +: x
res27: Seq[Int] = Vector(1, 1)

Now we get a Seq out.

So scala collections try to give you the most specific type for your operation, while still allowing for huge amounts of code reuse. They do this by leveraging the Repr type, as we as a CanBuildFrom(still getting to it) I know your probably wondering what this has to do with your question, don't worry we're getting to that right now. I am not going to say anything on Liskov's Substitution Principle, as it doesn't pertain much to your specific question (but you should still read about it!)

Okay, so now we understand that GenSeqLike[A, Repr] is the trait that scala collections use to reuse the code for Seq (and other things like Seq). And we understand that Repr is used to store the underlying collection representation to help inform the type of collection to return. How this last point works we have yet to explain, so let's do that now!

CanBuildFrom[-From, -Elem, +To]

A CanBuildFrom instance is how the collections library knows how to build the result type of a given operation. For instance the real type of the +: method on SeqLike[A, Repr] is this.

 abstract def +:[B >: A, That](elem: B)(implicit bf: CanBuildFrom[Repr, B, That]): That 

This means that in order to prepend an element to a GenSeqLike[A, Repr] we need an instance of CanBuildFrom[Repr, B, That] where Repr is the type of our current collection, B is a supertype of the elements that we have in our collection, and That is they type of collection we will have after the operation is done. I am not going to get into the internals of how CanBuildFrom works (again see the linked article for the details), for now just believe me that this is what it does.

Putting it all together

So now we are ready to build an instance of goFoo that works with GenSeqLike[A, Repr] values.

scala> def goFoo[A, Repr <: GenSeqLike[A, Repr]](ac: Repr)(p: Int)(implicit cbf: CanBuildFrom[Repr, A, Repr]): Repr = ac
goFoo: [A, Repr <: scala.collection.GenSeqLike[A,Repr]](ac: Repr)(p: Int)(implicit cbf: scala.collection.generic.CanBuildFrom[Repr,A,Repr])Repr

scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1))(1)
res7: IndexedSeq[Int] = Vector(1)

scala> goFoo[Int, ParVector[Int]](new ParVector(Vector(1)))(1)
res8: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)

What we are saying here, is that there is a CanBuildFrom that will take a subclass of GenSeqLike of type Repr over elements A and build a new Repr. This means that we can perform any operation on Repr type that will result in a new Repr, or in the specific case a new ParVector or IndexedSeq.

Unfortunately we must provide the generic parameters manually or the type system gets confused. Thankfully we can use higher kinds again to avoid this,

scala> def goFoo[A, Repr[A] <: GenSeqLike[A, Repr[A]]](ac: Repr[A])(p: Int)(implicit cbf: CanBuildFrom[Repr[A], A, Repr[A]]): Repr[A] = ac
goFoo: [A, Repr[A] <: scala.collection.GenSeqLike[A,Repr[A]]](ac: Repr[A])(p: Int)(implicit cbf: scala.collection.generic.CanBuildFrom[Repr[A],A,Repr[A]])Repr[A]

scala> goFoo(IndexedSeq(1))(1)
res16: IndexedSeq[Int] = Vector(1)

scala> goFoo(new ParVector(Vector(1)))(1)
res17: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)

So this is nice, in that it is a little more general than using GenSeq, but it is also way more confusing. I would not recommend doing this for anything other than a thought experiment.

Conclusion

While it has hopefully been informative to learn about how scala collections work to use GenSeqLike directly, I could hardly think of a use case where I would actually recommend it. The code is hard to understand, hard to work with, and may very well have some edge cases that I have missed. In general I would recommend avoiding interacting with scala collections implementation traits, such as GenSeqLike as much as is possible, unless you are installing your own collection into the system. You still have to touch GenSeqLike lightly to get all of the operations in GenSeq by giving it the correct Repr type, but you can avoid thinking about CanBuildFrom values.

Upvotes: 3

Related Questions