光阴冢

We make choices and life has a way of making us pay for them.

Concrete Semantics

Jan 3, 2020  

(默认没有 Isabelle 的高亮,因此暂时标注成了 OCaml )

HOL and other examples in Section 2 is very similar to SF, thus it will be admitted.

Related exercises: https://github.com/megrxu/cs-exercises

Syntax of Isabelle

type_synonym string = "char list"
(* type alias *)

datatype nat = O | Suc nat
(* destruct *)

abbreviation sq 0 :: "nat => nat" where
"sq 0 n ≡ n ∗ n"
(* abbreviations, syntactic sugar *)

fun add :: "nat => nat => nat" where
"add O n = n" |
"add (Suc m) n = Suc(add m n)"
(* function definition *)
(* recursive function requires the arguments of recursive calls on the right-hand side must be strictly smaller tha nthearguments on the left-hand side *)
(* same as Coq *)

lemma add_ 02: "add m 0 = m"
apply(induction m)
apply(auto)
done
(* lemma and proof is similar to Coq *)

value "rev(Cons True (Cons False Nil))"
(* display the value *)

lemma app_Nil2 [simp]: "app xs Nil = xs"
apply(induction xs)
apply(auto)
done
(* induction by xs *)
(* [simp]: make the eventual theorem(lemma) a simplification rule *)

lemma "div2(n) = n div 2"
apply(induction n rule: div2.induct )
(* another proof using induction without auto *)

Tactics

  • simp
  • auto
  • induct
  • split

Notes in Section 2 (Programming and Proving)

  • induction heuristics

    • computation induction: Let f be a recursive function. If the definition of f is more complicated than having one equation for each constructor of some data type, then properties of f are best proved via f.induct.
    • more than one arguments: Perfom induction on argument number i if the function is defined by recursion on argument number i.
    • generalize the goal before induction
  • simp: using simplification rules

    • using equation l = r from left to right (only)
    • as long as possible
    • arrtibute [simp] declares theorems to be simplification rules.
    • case split: apply(simp split: nat.split)

Notes in Section 3 (Case Study: IMP Expressions)

Syntex and Semantics

  • syntex
    • actually strings
    • abstract syntax trees
  • semantics
    • the meaning and values of the syntex
type_synonym vname = string
datatype aexp = N int | V vname | Plus aexp aexp
(* syntex *)

type_synonym val = int
type_synonym state = vname => val
(* semantics *)

Constant Folding

Program optimization, simplify the exprs.

fun asimp_const :: "aexp => aexp" where
  "asimp_const (V x ) = V x" |
  "asimp_const (Plus a 1 a 2) =
    (case (asimp_const a 1 , asimp_const a 2 ) of
      (N n 1, N n 2 ) => N (n 1+n 2 ) |
      (b 1 ,b2) => Plus b 1 b 2 )"
(* `case` acts like pattern matching *)

Stack Machine and Compilation

Executing a compiled expression (process) is the same as putting the value of the expression on the stack (result).

Notes in Section 4 (Logic and Proof Beyond Equality)

Proof Automation

  • fastforce tries harder than auto
  • force is the slow version of fastforce
  • blast is for complex logical goals
    • is a complete proof procedure for first-order formulas (very cool)
    • knows little about equality
    • sets, logic and relations
  • sledgehammer
    • calls a number of external automatic theorem provers (ATPs) to search for a proof (I think it's just like SAT or SMT solvers)
    • recommended to apply simp or auto before invoking sledgehammer.
  • arith: obviously, is for linear arithmetic fomulas.
  • try:
    • tries them all (calls sledgehammer)
    • a lightweight variant try0

Single Step Proofs

  • Instantiating Unknowns
    • using of
    • by unification
    • using where
  • Rule Application
    • Applying a rule backwards to a proof state
    • e.g.: applying conjI in “A /\ B” results in two subgoals A and
    • usage: apply (rule xyz)
  • Introduction Rules
    • part of the logical system of natural deduction
    • e.g.:
      • conjI iffI: split the goal into two subgoals
      • impI: moves left hand side as assumptions
      • allI: removes all forall by turning the variable into a fixed and local one
    • attribute [intro]: turn other theorems to introduction rules
      • increases the search space
      • can lead to nontermination

Notes in Section 5 (Isar: A Language for Structured Proofs)

By practice.

Isar core syntax

  • proof = by method | proof [method] stdp* **qed**
  • step = fix variables | assume proposition | [from fact+] (have | show) proposition proof
  • proposition = [name:] “formula” fact = name | …

Notes in Section 6-7 (Introduction, IMP: A Simple Imperative Language)

(No more complete isabelle thms, just informal notations)

  • IMP commands:
    • SKIP
    • Assign vname aexp
    • Seq com com
    • If bexp com com
    • While

(Then IMP is turing complete.)

  • c1 ;; c 2 ;; c3 = (c1 ;; c2) ;; c3 (it does not matter whether semicolon associates to the left or to the right)
  • IF and WHILE bind stronger

Big-Step Semantics

(Expand the commands until the end) big-step rules

  • derivation tree (for specific command(s))
  • execute the big-step rules: code generator
    • code_pred big_step
    • values {t. (SKIP. \lambda_. O) => t}
  • rule inversion
    • (SKIP, s) => t ==> t = s
    • for each rule …
  • equivalence of commands
    • w.r.t. c \sim c' === (\forall s t. (c, s) => t = (c', s) => t)
  • Equivalence relation R
  • Deterministic

Small-Step Semantics

(Push the state to next time point) small-step rule

  • Question:
    • What is star ? The actual meaning.
      • if apply function $f$ finite times, and the init state $c$ can be transformed into the target state $c'$, then we have $star~f~c~c'$
      • $P \vdash c \rightarrow\ast~c’ \equiv star~(exec1~P)~c~c'$
  • Equivalence with Big-Step semantics
    • How to prove?
    • for any big-step execution, there is an equivalent small-step execution and vice versa.
  • Final Configurations, Infinite Reductions, and Termination

Notes in Section 8 (Compiler)

Instructions and Stack Machine

  • inst
    • LOADI int
    • LOAD vname
    • ADD [data on stack]
    • STORE vname
    • JMP int
    • JMPLESS int
    • JMPGE int
  • Question: why we need both of last two insts?
  • terms:
    • stack: val list
    • config: (int, state, stack) Configurations, aka 格局; int is an program counter (PC)
  • definition is like we have done before:
    • (star again)

Reasoning About Machine Executions

Aim: to execute machine programs symbolically as far as possible using Isabelle’s proof tools.

  • execution results are preserved when appending additional codes to the left or right
    • right (trival)
    • left (by shift the PC)

Compilation

  • compiler :: exps => instr list
    • for arithmetic expressions
    • for boolean expressions
    • for commands
      • while, if: manipulate the PC (using JMP*)
        • add bcomp two more args: an offset and a flag
  • correctness
    • the evaluation results on (expressions | commands) are the same (semantic preserving)
    • PC's behavior

Preservation of Semantics

correctness proof:

  • the machine program should have precisely the same behaviour as the source program.
  • the same state change and nothing more than that

 Compiler correctness as two simulations

  • upper level: the bit-step execution: (s, c) -> t
  • lower level: stack machine execution
  • left: source code can be simulated by compiled code
  • right: “<–” direction
  • proof
    • by rule induction on the big-step semantics
    • the second direction is harder: lack of a suitable structural induction principle
      • use an outside structural induction on $c$
      • then a nested inner induction on the length of the execution for the WHILE case
  • another lemma about successors: execution of compiled code will never jump outside
    • concepts:
      • isucc: the successors of an instr
      • isucc: the successors of instr list (program)
      • exits: the exit point
    • lemmas:
      • successors over append

Types

Types prevent mistakes.

  • The good Static types that guarantee absence of certain runtime faults. (Java, ML, Haskell)
  • The Bad Static types that have mostly decorative value but do not guarantee anything at runtime. (C, C++)
  • The Ugly Dynamic types that detect errors only when it can be too late. (Perl, Python)

Static type systems can be seen as proof systems, type checking as proof checking, and type inference as proof search. (CH?)

Typed IMP

  • Typed Language
    • two types: int and real (not float)
    • some differences
      • aexp
        • syntex: addition between values with same type
        • taval: typed arithmetic value others are undefined
      • bexp
        • syntex: remain unchanged
        • tbval:
      • commands
        • syntax: unchanged
        • small-step: errors are explcitly visible in intermediate states
        • big-step: can be adjusted (but later)
      • well-behaved -> typed executions look as before, type errors -> get stuck
  • Type System
    • purpose: keep track of the type if each variable and allow only compatible combinations in expressions
    • typing environment: tyenv = vname => ty (map the var name to its type)

Security Type Systems

  • sound: accepted programs are all valid/safe/*
    • anti-monotonicity: [|l |- c; l' <= l |] ==> l' |- c
    • confinement: [|(c, s) => t ; l |- c]] ==> s = t (< l)
    • noninterference: [| (c, s) => s'' (c, t) => t'; O |- c; s = t (<= l) |] ==> s' = t'(<= l)
  • complete: accept all valid/safe/* programs
  • another direction: a bottom-up typing system
  • no clues about termination in such a type system: covert channel
  • Conducting security proofs in a theorem prover only helps against one of these: wrong proofs. Strong theorem statements and realistic model assumptions, or at least explicit model assumptions, are still our own responsibility.
    • A fix: modify the WHILE statement that won't admit anything higher than level 0. Then we have theory termination.

Summary

  • purpose of type systems: prevent errors
  • a type derivation is a proof: type checking performs basic automatic proofs about programs

Program Analysis

Pre

  • Optimization

    • a special case of program transformation
    • the analysis
    • the transformation
  • Error detection

  • Data Flow Analysis

    • initialization
    • constant folding
    • live variable analysis

Definite Initialization Analysis

  • D :: vname set => com => vname set => bool
    • If all variables in $A$ are initialized before $c$ is executed, then no uninitialized variable is accessed during execution, and all variables in $A'$ are initialized afterwards.
    • $\mathcal{D}~c = (\exists A’.~D~{}~c~A’)$
  • Initialization Sensitive Expression Evaluation
    • observe an error when the program uses a variable that has not been initialized
    • use option type
      • type_synonym state = vname => val option
    • lemmas about initialized expressions
      • arithmetic exps.
      • boolean exps.
    • For small-step style
      • $D$ is increasing
      • $D$ is sound
    • Big-step style
      • easy to write big-step semantics
      • with a larger number of big-step rules
      • therefore a larger number of cases in inductive proofs

Constant Folding and Propagation

  • 2 common compiler optimizations)
  • constant folding: computing the value of constant expression at compile time and substituting their value
  • constant propagation: derermining if a variable has constant value and propagating it to the use-occurrences
  • Folding
    • folding on arithmetic expressions
        fun afold :: aexp => tab => aexp where
          afold (N n) _ = N n
          afold (V x) t = (case t x of None => V x | Some x => N x)
          afold (Plus e1 e2) t = (case (afold e1 t, afold e2 t) of 
                                    (N n1, N n2) => N (n1 + n2)
                                  | (e1', e2') => Plus e1' e2')
    
    • correctness of afold:
      • definition approx t s = (forall x k. t x = Some k --> s x = k)
      • approx t s ==> aval (afold a t) s = aval a s
  • Propagation
    • lift the constant table to the commands
    • if case: merge t1 t2 = (\m => if t1 m = t2 m then t1 m else None)
    • sequential case: composition: just the same
    • while case: discard all vars not in LHS, t|>S -> (\x => if x \in S then t x else None).
  • Conditional equivalence:
    • P |= c ~ c' = (\forall s s'/ P s --> (c, s) => s' = (c', s) => s')
    • 3 properties: reflexive, symmetric, transitive
  • Correctness
    • original: fold c empty ~ c
    • from an arbitrary table t: approx t |= c ~ fold c t
      • approx is partially applied!
      • for the proof part: the generalized form is firstly proved
      • then by applying approx empty = (\_ -> True) we have the original lemma proved.

Live Variable Analysis

  • The elimination of assignments to dead variables.
  • As a recursive function: fun L :: com => vname set => vname set
    • X: the set of variables that are live after c
    • return: … before
    • backward
    • as for while (WHILE b THEN c):
      • vars b ⊆ L w X
      • b = False => X ⊆ L w X
      • L c (L w X) ⊆ L w X (hard to prove)
  • Generate and Kill analyses
    • A :: com => t set => t set
    • kill: definitely written in the current command c
      • for example, kill (WHILE b DO c) = {}
    • gen: maybe used in the current command c
      • for example, gen (c1 ;; c2) = gen c1 ∪ (gen c2 − kill c1 )
  • Transform: L c X = gen c ∪ (X - kill c)
    • as for while:
      • L c (L w X) ⊆ L w X
  • Optimization

True Liveness

An exmaple: x := y ;; x := 0, y then dead. Thus the first exmaple is true liveness.

Modify: L (x ::= a) X = (if x ∈ X then vars a ∪ (X − {x}) else X)

The the while case becomes more interestingly: vars b ∪ X ∪ L c (L w X ) ⊆ L w X.

  • Knaster=Tarski Fixpoint Theorem on Sets
    • partial order $\subseteq$
    • fixpoint: $f(p) = p$
    • pre-fixpoint: $f(p) \subseteq p$
    • compute the least pre-fixpoint: $\bigcap \{ p. f(p) \subseteq p \}$
    • as for monotone functions, the least pre-fixpoint is the least fixpoint (done!)
  • Then we have to prove such lemmas:
    • L c (partialy applied) is monotone
    • the correctness of L
    • the computation of least fixpoint:
      • f is monotone
      • A ⊆ U ==> f A ⊆ U
      • then lfp f = while (λA. f A != A) f {}.

Summary

  • Forward/Backward
  • May/Must (soma path / all paths)

Definite iunitialization analysis: forward and must Constant propagation: forward and must Live variable analysis: backward and may

Denotational Semantics

What

  • Big_step is not a true denotational semantics
  • denotational semantics characterised as:
    • types: syntex of syntactic objects, semantics of denotations
    • function: syntex -> semantics defined by primitive recursion
    • in words, the meaning of a compound object is defined as a function of the meaning of its subcomponent
    • examples:
      • aval and bval are denotational definitions
      • but the big-step semantics is NOT.

A Relational Denotational Semantics

  • start from the relations

  • denotational semantics acts like a relation between s and t:

    • Skip => Id
    • WHILE loop is tricky, since it is implemented in a fixpoint way.
    • equivalent to big_step semantics: $(c, s) \Rightarrow t \longleftrightarrow (s, t) \in D~c$
  • usually based on partial functions

    • difficulty: KT Fixpoint Thm. does not apply
    • functions do not form a complete lattice because the union of two functions is not necessarily again a functioin
    • continuity which guarantees least fixpoint also in the space of partial functins
      • chain is a sequence of sets: chains S = (forall i. S i ⊆ S (Suc i)).
      • function is continuous: cont f = (forall S. chains S --> f (forall i. ∪ (S i))) = forall i. ∪ (f (S i))
      • continuous function is monotone.
      • A general version of KT Fixpoint Thm.,Kleene Fixpoint Theorem: cont f ==> lfp f = ∪ n f^n {}
    • single-valued: $\forall x, y, z. (x, y) \in r \wedge (x, z) \in r \longrightarrow y = z$

Hoare Logic

Another semantics, axiomatic semantics: proving properties of imperative programs.

Proof via poerational Semantics

Hoare Logic for Partial Correctness

  • P and Q
    • partial correctness: the post condition is only required to be true only when the program terminates.
      • since False --> Anything
    • total correctness = partial correctness + termination
    • pre- post-conditions: asstertions, 2 approaches
      • Syntactic: assertions are concrete syntactic objects (bexp)
      • Semantic: assertions are predicates on states (state => bool)
  • Syntactic Assertions
    • Proof System
      • Hoare triples are derivable.
  • Assertions as Functions
    • type_synonym assn = state => bool
    • |= {P} c {Q} <-> (∀ s t. P s /\ (c, s) => t --> Q t)
  • Example Proof

Soundness and Completeness

  • Soundness: if a triple is derivable, it is also valid.
    • |- {P} c {Q} ==> |= {P} c {Q}
  • Commpleteness: if a triple is valid it is also derivable.
    • |= {P} c {Q} ==> |- {P} c {Q}
  • weakest precondition: wp

Verification Condition Generation

  • from triple to the assertion (iff) : VCG

Abstract Inrerpretation

(Needs further reading…)