Tutorial to start with program and datatype refinement

Question

I read the Code generation from Isabelle/HOL theories manual. However, I still feel a bit lost. Where do I need things like linorder? How can I use e.g., red-black trees to make things faster? How is a locale used in the context of program refinement? ...

Is there some tutorial to get started with refinement? If not, is there some short, self contained, correct Example?

Can we develop an example?

Let's assume we have A :: 'a set and we know finite A. How do we proceed to generate for example efficient code for a ∈ A?

How do we express our knowledge of finite A. How can we keep the mathematical theory (a ∈ A) separate from the code generation and optimization?

corny · Accepted Answer

Okay, I'll try my best to answer this question. Please feel free to comment and edit to improve it. If you want the karma, you can also copy this answer and repost it (If your answer is better, I'll delete my answer).

This user-guide helped me to get started.

Okay, let's assume we have our theory in the locale MyLocale.

locale MyLocale =
  fixes A :: "'a set"
  begin
    definition isInA :: "'a => bool" where
      "isInA a ⟷ a ∈ A"
  end

Let's assume we implement the isInA function by implementing the locale's set A by a list. Then, we can generate code and show its correctness

definition code_isInA_list :: "'a list => 'a => bool"
  where "code_isInA_list Al a ⟷ a ∈ set Al"

lemma code_isInA_list_correct: 
 "code_isInA_list Al a = MyLocale.isInA (set Al) a"
  by(simp add: MyLocale.isInA_def code_isInA_list_def)

export_code code_isInA_list in Scala file -

This basically yields the following Scala code

def code_isInA_list[A : HOL.equal](al: List[A], a: A): Boolean =
  Lista.member[A](al, a)

The Lista.member functions is a linear-time algorithm. Can we do better?

Let's assume we have a linorder on our type 'a. For example, we have linorder on natural numbers, strings, lists, tuples, ... We express this assumption by writing 'a ::linorder instead of 'a. One datatype that can leverage the linorder property is a red-black-tree. In our locale, A was a set. In sets, the order of elements does not matter. So we can use the more efficient red-black tree instead of lists. The red-black tree can be more efficient than lists because the elements can be ordered arbitrarily while lists dictate a fixed order. To use red-black trees, we import the Collections framework from the archive of formal proofs. At the beginning of our thy-file we add to the imports section

"./isabelle_afp/Collections/Collections"

Now we can implement a red-black tree (rs) version and show its correctness. The function rs_α (type alpha in jEdit for the alpha) from the Collections framework maps the red-black tree to its corresponding set.

definition code_isInA_rs :: "('a ::linorder) rs => 'a => bool"
  where "code_isInA_rs Al a ⟷ rs_memb a Al"

lemma code_isInA_rs_correct: 
 "code_isInA_rs Ars a = MyLocale.isInA (rs_α Ars) a"
  by(simp add: MyLocale.isInA_def code_isInA_rs_def rs_correct)

export_code code_isInA_rs in Scala file -

We yield the following code

def code_isInA_rs[A : Orderings.linorder](al: RBT.rbt[A, Unit], a: A): Boolean =
  RBTSetImpl.rs_memb[A].apply(a).apply(al)

The rs_memb function has a logarithmic runtime. TODO: really? where can I look it up?

Let's further improve our code. Let's assume we want to stick to the code_isInA_list version, because lists are easier to us in our code. We can implement code_isInA_list in terms of code_isInA_rs. Therefore, we use list_to_rs that converts a list into a red-black tree. With the [code] attribute, we tell the code generator to use the new version.

lemma [code]: "code_isInA_list Al a ⟷ code_isInA_rs (list_to_rs Al) a"
  by(simp add: code_isInA_list_def code_isInA_rs_def rs_correct)

export_code code_isInA_list in Scala file -

We yield the following code.

def code_isInA_list[A : Orderings.linorder](al: List[A], a: A): Boolean =
  code_isInA_rs[A](RBTSetImpl.list_to_rs[A].apply(al), a)

I guess creating a red-black and then do the lookup in it is more expensive than the simpler list version, but this is just an example. If we need more lookups, the red-black tree version can definitely be more efficient.

Let's test the code in Isabelle jEdit with some types. We start with natural numbers.

value[code] "code_isInA_list [(1::nat), 3, 7, 4] 4"

This results in True

Next, we try strings.

value[code] "code_isInA_list [''a'', ''b'', ''xyz''] ''b''"

This results in

Wellsortedness error:
  Type char list not of sort linorder
  No type arity list :: linorder

The error message tells us that no linorder is defined on the string type. So we import the following theories.

"~~/src/HOL/Library/List_lexord"
"~~/src/HOL/Library/Code_Char"
"~~/src/HOL/Library/Code_Char_chr"
"~~/src/HOL/Library/Code_Char_ord"

Now we get the desired result True. The code even works for tuples.

value[code] "code_isInA_list [(''a'', ''a''), (''b'', ''c''), (''xyz'', '''')] (''b'', ''c'')"

Tutorial to start with program and datatype refinement

Answers (1)

Related Questions