Reputation: 37
I am trying to represent assembly programs in Coq and compile them into a sequence of bytes.
I have an assembly language, a decoder, a compiler, and proof that the decoder is the inverse of the compiler for instructions satisfying a well-typedness property.
Now I'm working on creating representations of assembly code that are easier for humans to read and write which will be compiled to bytes. I'd like to support labels so I don't have to manually do instruction-length and offset arithmetic at every branch and jump. That's the difference between this representation and asm
below.
The data is supposed to flow like this:
Assembly Representation (What I'm asking about)
|
V
Assembler (Incomplete but easy)
|
V
Assembly (Complete, see the `asm` type below)
|
V
Compiler (Complete)
|
V
Bytes
|
V
Decoder (Complete)
|
V
Assembly (Complete, see the `asm` type below)
This is what I have so far:
Require Import NArith.
Inductive asm : Type :=
| ADD (rd rs1 rs2:N)
| BEQ (rs1 rs2:N) (si:Z)
(* other assembly instructions *)
.
(* Tentative function for finding location of a label *)
Require Import String.
Require Import List.
Import ListNotations.
Open Scope string.
Open Scope N.
Definition insn_length (a:asm) : N := 4.
Fixpoint label_loc (label:string) (base_addr:N) (code:list (sum string asm)) : option N :=
match code with
| nil => None
| inl label' :: t => if (label =? label')%string then Some base_addr
else label_loc label base_addr t
| inr insn :: t => label_loc label (base_addr + (insn_length insn)) t
end.
(* Ideal program representation is a list of labels and instructions
and allows instructions to use label locations to compute immediates *)
Definition ideal_example (base_address : N) : list (sum string asm) :=
[ inr (ADD 0 0 0) ;
inl "label" ;
inr (BEQ 0 0 (label_loc "label" base_address (ideal_example base_address)))
].
I tried mutually-recursive fixpoints thinking that maybe I could refer to "constant" defined after the program that calculate the label locations. I do not like this solution because of the additional layer of indirection and it's not really a fixpoint operation. As Coq points out "A fixpoint needs at least one parameter." Here's my attempt that throws that error :
Fixpoint mutual_recursion_example : list (sum string asm) :=
[ inr (ADD 0 0 0) ;
inl "label" ;
inr (BEQ 0 0 klabel)
]
with klabel := label_loc "label" 0 mutual_recursion_example.
I'm considering creating another language for preprocessing, say pre_asm
, but then I need to somehow prove that the step from pre_asm
to asm
is semantic preserving, so I hope to get some feedback first. I'm hoping I missed some useful pattern or insight that is well known in the community.
(* Possible pre-assembly language. *)
Inductive pre_asm :=
| pADD (rd rs1 rs2:N)
| pBEQ (rs1 rs2:N) (si_or_label:sum Z string)
.
Fixpoint preprocess (base_address:N) (code:list (sum string pre_asm)) : list asm
:= (* fill in here *).
Upvotes: 0
Views: 123
Reputation: 1138
[Here I just offer a possible design: I might complete it with (more) concrete definitions depending on the OP's feedback.]
Let HAsm
be the human-readable representation of instructions, i.e. optionally-labelled instructions (do not make "label" itself an instruction), where jump targets are labels (must be existing labels within a HProg
, i.e. a human-readable program). Let Asm
be the internal representation of instructions, unlabelled and where jump targets are address offsets (must be within-bounds within a Prog
, i.e. a program in the internal representation).
We/I'd want a bidirectional transform HProg <--> Prog
between programs in the two representations, but we cannot in fact get an invertible such transform, as Prog
does not (and shall not) keep track of labels in a corresponding HProg
: from a Prog
we can only synthesize labels based on some scheme (from a jump's target offset, possibly combined with the jump instruction's offset relative to the program's base address).
Instead, let [HProg]
denote classes of HProg
under alpha-equivalence by labels (i.e. treating labels like variable names). Then we can define an invertible transform [HProg] <--itransform--> Prog
, which, for correctness, we must prove establishes a 1-to-1 correspondence between [HProg]
and Prog
.
Further, let IProg
be the canonical representatives of classes in [HProg]
, and define a function hcanon : HProg -> IProg
that, for given HProg
, returns the canonical representative of the [HProg]
class it belongs to.
Finally, we can extend our transform chain to HProg --hcanon--> IProg <--itransform--> Prog
, where, to close the circle then for correctness, hcanon
must rewrite labels in the given HProg
in such a way that the returned IProg
is labelled exactly as the fixed labelling scheme to transform from Prog
to IProg
(was [HProg]
) would do.
Upvotes: 1