Kevin Meredith
Kevin Meredith

Reputation: 41919

Understanding ScalaChecks' 'generation size'

ScalaCheck's Gen API docs explain lazy val sized:

def sized[T](f: (Int) ⇒ Gen[T]): Gen[T]

Creates a generator that can access its generation size

Looking at the following example:

import org.scalacheck.Gen.{sized, posNum}

scala> sized( s => { println(s); posNum[Int] })
res12: org.scalacheck.Gen[Int] = org.scalacheck.Gen$$anon$3@6a071cc0

scala> res12.sample
100
res13: Option[Int] = Some(12)

scala> res12.sample
100
res14: Option[Int] = Some(40)

What is the meaning of generation size, namely 100 in the above output?

Upvotes: 5

Views: 1532

Answers (2)

user355252
user355252

Reputation:

sized provides access to the "size" parameter of Scalacheck. This parameter indicates how "big" the values generated by a generator should be. This parameter is useful in a couple of situations:

  • You want to limit the amount of values generated to make property generation and thus test run faster.
  • You need to fit generated data into external constraints, like form validators which check the length of a string, or databases which put limits on columns.
  • You need to generate recursive data structures and terminate at some point.

The companion of Gen.sized is Gen.resize which lets you change the size of a generator, as in Gen.resize(10, Gen.alphaNumString) which will generate a alpha-numeric string of no more than ten characters.

Most built-in generators use sized in some way, for instance Gen.buildableOf (which is the underpinning of all generators for lists and containers):

[…] The size of the container is bounded by the size parameter used when generating values.

A simple example

To get an idea of how Gen.size is used take a look at the example in "Sized Generators" (Generators, Scalacheck user guide):

def matrix[T](g: Gen[T]): Gen[Seq[Seq[T]]] = Gen.sized { size =>
  val side = scala.math.sqrt(size).asInstanceOf[Int]
  Gen.listOfN(side, Gen.listOfN(side, g))
}

This generator uses the "size" to limit the dimensions of the matrix so that the entire matrix will never have more than entries than the "size" parameter. In other words, with a size of 100 as in your question, the generated matrix would have 10 rows and 10 columns, amounting to a total of 100 entries.

Recursive data structures

"size" is particularly useful to make sure that generators for recursive data structures terminate. Consider the following example which generates instances of a binary tree and uses size to limit the height of each branch to make sure that the generator terminates at some point:

import org.scalacheck.Gen
import org.scalacheck.Arbitrary.arbitrary

sealed abstract class Tree
case class Node(left: Tree, right: Tree, v: Int) extends Tree
case object Leaf extends Tree

val genLeaf = Gen.const(Leaf)
val genNode = for {
  v <- arbitrary[Int]
  left <- Gen.sized(h => Gen.resize(h/2, genTree))
  right <- Gen.sized(h => Gen.resize(h/2, genTree))
} yield Node(left, right, v)

def genTree: Gen[Tree] = Gen.sized { height =>
  if (height <= 0) {
    genLeaf
  } else {
    Gen.oneOf(genLeaf, genNode)
  }
}

Note how the generator for nodes recursively generates trees, but allows them only half of the "size". The tree generator in turn will only generate leafs if its size is exhausted. Thus the "size" of the generator is an upper bound for the height of the generated tree, ensuring that the generator terminates at some point and does not generate excessively large trees.

Note that the size only sets an upper bound for the height of the tree in this example. It does not affect the balancing of the generated tree or the likeliness of generating a tree with a certain depth. These depend solely on the bias defined in genTree.

With oneOf each subtree has a 50% chance of being just a Leaf, ending the growth of the tree at this branch, which makes generating a complete tree that exhausts the "whole" size somewhat unlikely.

frequency (see below) let's you encode a different bias. In the example below nodes a way more likely than leafs, so the tree generated by the generator below is more likely to grow, but it's still unlikely to be complete.

Relation to Gen.frequency

Gen.frequency is for a different use case: You would not use it to limit the depths or size of a data structure, but to add a certain bias to a choice of generators. Take a look at the definition of Gen.option:

def option[T](g: Gen[T]): Gen[Option[T]] =
  frequency(1 -> const(None), 9 -> some(g))

This definition uses frequency to make the less-interesting case of None less likely than the more interesting case of Some.

In fact, we could combine Gen.sized and Gen.frequency in our binary tree example above to make genTree more likely to generate "interesting" nodes rather than "boring" leafs:

def genTree: Gen[Tree] = Gen.sized { height =>
  if (height <= 0) {
    genLeaf
  } else {
    Gen.frequency(1 -> genLeaf, 9 -> genNode)
  }
}

Upvotes: 13

Vidya
Vidya

Reputation: 30320

Generation size is the number of results the generator will produce. The sized method simply lets you write generators that know their own size so you can use that information as a factor in what you generate.

For example, this generator (from this resource) produces two lists of numbers where 1/3 are positive and 2/3 are negative:

import org.scalacheck.Gen
import org.scalacheck.Prop.forAll

val myGen = Gen.sized { size =>
  val positiveNumbers = size / 3
  val negativeNumbers = size * 2 / 3
  for {
    posNumList <- Gen.listOfN(positiveNumbers, Gen.posNum[Int])
    negNumList <- Gen.listOfN(negativeNumbers, Gen.posNum[Int] map (n => -n))
  } yield (size, posNumList, negNumList)
}

check {
  forAll(myGen) {
    case (genSize, posN, negN) =>
     posN.length == genSize / 3 && negN.length == genSize * 2 / 3
  }
}

So sort of like zipWithIndex in Scala collections, sized just provides you meta information to help you do what you need to do.

Upvotes: 1

Related Questions