Reputation: 1953
I am new to Scala and am building tools for statistical estimation. Consider the following: a trait probabilityDistribution
is defined, which guarantees that classes which inherit from it will be able to perform certain functions, such as compute a density. Two such examples of probability distributions might be a binomial and beta distribution. The support of these two functions is Int
and Double
, respectively.
Set Up
trait probabilityDistribution extends Serializable {
type T
def density(x: T): Double
}
case class binomial(n: Int, p: Double) extends probabilityDistribution {
type T = Int
def density(x: Int): Double = x*p
}
case class beta(alpha: Double, beta: Double) extends probabilityDistribution {
type T = Double
def density(x: Double): Double = x*alpha*beta
}
Note that the actual mathematical implementations of the density
methods are simplified above. Now, consider a Mixture Model, in which we have several features or variables which come from different distributions. We may choose to create a list of probabilityDistribution
s to represent our features.
val p = List(binomial(5, .5), beta(.5,.5))
Suppose that we are now interested in supplying a vector of hypothetical data values, and wish to query the density
functions for each respective probability distribution.
val v = List[Any](2, 0.75)
The Problem Of course, we use a zip with map. However, this doesn't work:
p zip v map { case (x,y) => x.density(y) }
### found : Any
# required: x.T
Caveat: Choice of Container
A valid question is to wonder why I have chosen List[Any]
as the container to hold data values, rather than List[Double]
, or perhaps List[T <: Double]
. Consider the case when some of our probability distributions have a support over vectors or even matrices (e.g. multivariate normal and inverse Wishart)
An idea to address the caveat might be to instead house our input values in a container that is more representative of our input type. e.g. something like
class likelihoodSupport
val v = List[likelihoodSupport](...)
where Int
, Double
, and Array[Double]
and even a tuple (Array[Double], Array[Array[Double]])
all inherit from likelihoodSupport
. As some of these classes are final, however, this is not possible.
One (Crummy) Fix
Note that this can be handled by using pattern matching and a polymorphic method within each subclass, but as Odersky might say this has a code smell:
trait probabilityDistribution extends Serializable {
type T
def density[T](x: T): Double
}
case class binomial(n: Int, p: Double) extends probabilityDistribution {
type T = Int
def density[U](x: U): Double = x match {case arg: Int => arg * p }
}
case class beta(alpha: Double, beta: Double) extends probabilityDistribution {
type T = Double
def density[U](x: U): Double = x match {case arg: Double => arg * alpha * beta}
}
We can now run
p zip v map { case (x,y) => x.density(y) }
Plea I know what I'm trying to do should be very easily accomplished in such a beautiful and powerful language, but I can't figure out how! Your help is much appreciated.
Note I am not interested in using additional packages/imports, as I feel this problem should be trivially solved in base Scala.
Upvotes: 2
Views: 120
Reputation: 170859
You can't do it given the separate p
and v
lists (at least without casts, or by writing your own HList
library). This should be obvious: if you change the order of elements in one of these lists, the types won't change (unlike for HList
), but distributions will now be paired with values of a wrong type!
The simplest approach is to add a cast:
p zip v map { case (x,y) => x.density(y.asInstanceOf[x.T]) }
Note that this may be a no-op at the runtime and lead to a ClassCastException
inside density
call instead, thanks to JVM type erasure.
If you want a safer alternative to the cast, something like this should work (see http://docs.scala-lang.org/overviews/reflection/typetags-manifests.html for more information on ClassTags
and related types):
// note that generics do buy you some convenience in this case:
// abstract class probabilityDistribution[T](implicit val tag: ClassTag[T]) extends Serializable
// will mean you don't need to set tag explicitly in subtypes
trait probabilityDistribution extends Serializable {
type T
implicit val tag: ClassTag[T]
def density(x: T): Double
}
case class binomial(n: Int, p: Double) extends probabilityDistribution {
type T = Int
val tag = classTag[Int]
def density(x: Int): Double = x*p
}
p zip v map { (x,y) =>
implicit val tag: ClassTag[x.T] = x.tag
y match {
case y: x.T => ...
case _ => ...
}
}
Or you can combine distributions and values (or data structures containing values, functions returning values, etc.):
// alternately DistribWithValue(d: probabilityDistribution)(x: d.T)
case class DistribWithValue[A](d: probabilityDistribution { type T = A }, x: A) {
def density = d.density(x)
}
val pv: List[DistribWithValue[_]] = List(DistribWithValue(binomial(5, .5), 2), DistribWithValue(beta(.5,.5), 0.75))
// if you want p and v on their own
val p = pv.map(_.d)
val v = pv.map(_.x)
Of course, if you want to use a probabilityDistribution
as a method argument, as the question title says, it's simple, for example:
def density(d: probabilityDistribution)(xs: List[d.T]) = xs.map(d.density _)
The problems only arise specifically when
The user may wish to make multiple density queries with different x values that are not intrinsically related to the probability distribution itself
and the compiler can't prove that these values have the correct type.
Upvotes: 2