Reputation: 49705
Note: This is an FAQ, asked specifically so I can answer it myself, as this issue seems to come up fairly often and I want to put it in a location where it can (hopefully) be easily found via a search
As prompted by a comment on my answer here
For example:
"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]
Looking in the scaladoc, all of these use the map
operation inherited from TraversableLike
, so how come it's always able to return the most specific valid collection? Even String
, which provides map
via an implicit conversion.
Upvotes: 60
Views: 4488
Reputation: 49705
Scala collections are clever things...
Internals of the collection library is one of the more advanced topics in the land of Scala. It involves higher-kinded types, inference, variance, implicits, and the CanBuildFrom
mechanism - all to make it incredibly generic, easy to use, and powerful from a user-facing perspective. Understanding it from the point-of-view of an API designer is not a light-hearted task to be taken on by a beginner.
On the other hand, it's incredibly rare that you'll ever actually need to work with collections at this depth.
So let us begin...
With the release of Scala 2.8, the collection library was completely rewritten to remove duplication, a great many methods were moved to just one place so that ongoing maintenance and the addition of new collection methods would be far easier, but it also makes the hierarchy harder to understand.
Take List
for example, this inherits from (in turn)
LinearSeqOptimised
GenericTraversableTemplate
LinearSeq
Seq
SeqLike
Iterable
IterableLike
Traversable
TraversableLike
TraversableOnce
That's quite a handful! So why this deep hierarchy? Ignoring the XxxLike
traits briefly, each tier in that hierarchy adds a little bit of functionality, or provides a more optimised version of inherited functionality (for example, fetching an element by index on a Traversable
requires a combination of drop
and head
operations, grossly inefficient on an indexed sequence). Where possible, all functionality is pushed as far up the hierarchy as it can possibly go, maximising the number of subclasses that can use it and removing duplication.
map
is just one such example. The method is implemented in TraversableLike
(Though the XxxLike
traits only really exist for library designers, so it's generally considered to be a method on Traversable
for most intents and purposes - I'll come to that part shortly), and is widely inherited. It's possible to define an optimised version in some subclass, but it must still conform to the same signature. Consider the following uses of map
(as also mentioned in the question):
"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]
In each case, the output is of the same type as the input wherever possible. When it's not possible, superclasses of the input type are checked until one is found that does offer a valid return type. Getting this right took a lot of work, especially when you consider that String
isn't even a collection, it's just implicitly convertible to one.
So how is it done?
One half of the puzzle is the XxxLike
traits (I did say I'd get to them...), whose main function is to take a Repr
type param (short for "Representation") so that they'll know the true subclass actually being operated on. So e.g. TraversableLike
is the same as Traversable
, but abstracted over the Repr
type param. This param is then used by the second half of the puzzle; the CanBuildFrom
type class that captures source collection type, target element type and target collection type to be used by collection-transforming operations.
It's easier to explain with an example!
BitSet defines an implicit instance of CanBuildFrom
like this:
implicit def canBuildFrom: CanBuildFrom[BitSet, Int, BitSet] = bitsetCanBuildFrom
When compiling BitSet(1,2,3,4) map {2*}
, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, Int, T]
This is the clever part... There's only one implicit in scope that matches the first two type parameters. The first parameter is Repr
, as captured by the XxxLike
trait, and the second is the element type, as captured by the current collection trait (e.g. Traversable
). The map
operation is then also parameterised with a type, this type T
is inferred based on the third type parameter to the CanBuildFrom
instance that was implicitly located. BitSet
in this case.
So the first two type parameters to CanBuildFrom
are inputs, to be used for implicit lookup, and the third parameter is an output, to be used for inference.
CanBuildFrom
in BitSet
therefore matches the two types BitSet
and Int
, so the lookup will succeed, and inferred return type will also be BitSet
.
When compiling BitSet(1,2,3,4) map {_.toString}
, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, String, T]
. This will fail for the implicit in BitSet, so the compiler will next try its superclass - Set
- This contains the implicit:
implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A]
Which matches, because Coll is a type alias that's initialised to be BitSet
when BitSet
derives from Set
. The A
will match anything, as canBuildFrom
is parameterised with the type A
, in this case it's inferred to be String
... Thus yielding a return type of Set[String]
.
So to correctly implement a collection type, you not only need to provide a correct implicit of type CanBuildFrom
, but you also need to ensure that the concrete type of that of that collection is supplied as the Repr
param to the correct parent traits (for example, this would be MapLike
in the case of subclassing Map
).
String
is a little more complicated as it provides map
by an implicit conversion. The implicit conversion is to StringOps
, which subclasses StringLike[String]
, which ultimately derives TraversableLike[Char,String]
- String
being the Repr
type param.
There's also a CanBuildFrom[String,Char,String]
in scope so that the compiler knows that when mapping the elements of a String
to Char
s, then the return type should also be a string. From this point onwards, the same mechanism is used.
Upvotes: 81
Reputation: 41646
The Architecture of Scala Collections online pages have a detailed explanation geared towards the practical aspects of creating new collections based on the 2.8 collection design.
Quote:
"What needs to be done if you want to integrate a new collection class, so that it can profit from all predefined operations at the right types? On the next few pages you'll be walked through two examples that do this."
It uses as example a collection for encoding RNA sequences and one for Patricia trie. Look for the Dealing with map and friends section for the explanation of what to do to return the appropriate collection type.
Upvotes: 8