How Can I Represent An Assembly Program With Labels?

Question

I am trying to represent assembly programs in Coq and compile them into a sequence of bytes.

I have an assembly language, a decoder, a compiler, and proof that the decoder is the inverse of the compiler for instructions satisfying a well-typedness property.

Now I'm working on creating representations of assembly code that are easier for humans to read and write which will be compiled to bytes. I'd like to support labels so I don't have to manually do instruction-length and offset arithmetic at every branch and jump. That's the difference between this representation and asm below.

The data is supposed to flow like this:

Assembly Representation (What I'm asking about)
                |
                V
Assembler (Incomplete but easy)
                |
                V
Assembly (Complete, see the `asm` type below)
                |
                V
        Compiler (Complete)
                |
                V
              Bytes
                |
                V
        Decoder (Complete)
                |
                V
Assembly (Complete, see the `asm` type below)

This is what I have so far:

Require Import NArith.

Inductive asm : Type :=
  | ADD (rd rs1 rs2:N)
  | BEQ (rs1 rs2:N) (si:Z)
  (* other assembly instructions *)
.

(* Tentative function for finding location of a label *)
Require Import String.
Require Import List.
Import ListNotations.

Open Scope string.
Open Scope N.
Definition insn_length (a:asm) : N := 4.

Fixpoint label_loc (label:string) (base_addr:N) (code:list (sum string asm)) : option N :=
  match code with
  | nil => None
  | inl label' :: t => if (label =? label')%string then Some base_addr 
      else label_loc label base_addr t
  | inr insn :: t => label_loc label (base_addr + (insn_length insn)) t
  end.

(* Ideal program representation is a list of labels and instructions
   and allows instructions to use label locations to compute immediates *)
Definition ideal_example (base_address : N) : list (sum string asm) :=
  [ inr (ADD 0 0 0) ;
    inl "label" ;
    inr (BEQ 0 0 (label_loc "label" base_address (ideal_example base_address)))
  ].

I tried mutually-recursive fixpoints thinking that maybe I could refer to "constant" defined after the program that calculate the label locations. I do not like this solution because of the additional layer of indirection and it's not really a fixpoint operation. As Coq points out "A fixpoint needs at least one parameter." Here's my attempt that throws that error :

Fixpoint mutual_recursion_example : list (sum string asm) :=
  [ inr (ADD 0 0 0) ;
    inl "label" ;
    inr (BEQ 0 0 klabel)
  ]
with klabel := label_loc "label" 0 mutual_recursion_example.

I'm considering creating another language for preprocessing, say pre_asm, but then I need to somehow prove that the step from pre_asm to asm is semantic preserving, so I hope to get some feedback first. I'm hoping I missed some useful pattern or insight that is well known in the community.

(* Possible pre-assembly language. *)
Inductive pre_asm :=
  | pADD (rd rs1 rs2:N)
  | pBEQ (rs1 rs2:N) (si_or_label:sum Z string)
.

Fixpoint preprocess (base_address:N) (code:list (sum string pre_asm)) : list asm
  := (* fill in here *).

How Can I Represent An Assembly Program With Labels?

Answers (1)

Related Questions