Metaprogramming
and Self-Interpretation

Marvin Borner

2023-11-11

My favorite lambda calculus program is the metacircular self-interpreter from John Tromp’s “Most Functional” winning entry to the 2012 International Obfuscated C Contest (IOCCC)¹. It’s a universal machine for lambda calculus that only needs 232 bits of binary lambda calculus (BLC) – it can evaluate the output of any valid BLC program!

  01010001
   10100000
    00010101
     10000000
      00011110
       00010111
        11100111
         10000101
          11001111
          000000111
         10000101101
        1011100111110
       000111110000101
      11101001 11010010
     11001110   00011011
    00001011     11100001
   11110000       11100110
  11110111         11001111
 01110110           00011001
00011010             00011010

Tromp’s entry inspired me to implement a self-interpreter for the meta encoding of my programming language bruijn. Read my article about bruijn or the recently created wiki first if needed.

This article contains explanations of the encoding, its usage in bruijn (including quasiquotation and macro creation), examples of interacting with the encoding, and finally, after a translation of Tromp’s self-interpreter, the derivation of my self-interpreter (which, as it turns out, only needs 194 bits!). You can find the resulting meta library module here.

Mogensen-Scott

The Mogensen-Scott encoding (MSE) describes a way to encode any lambda calculus term as a meta program that allows lambda terms to parse, modify and traverse themself. Encoding a term as MSE works like this:

$\begin{align*} \mathrm{mse}(x)&=\lambda abc.(a\ x)\\ \mathrm{mse}((M\ N))&=\lambda abc.((b\ \mathrm{mse}(M))\ \mathrm{mse}(N))\\ \mathrm{mse}(\lambda x.M)&=\lambda abc.(c\ (\lambda x.\mathrm{mse}(M))) \end{align*}$

In bruijn, this translation is implemented as syntactic sugar using the quote (backtick) prefix `. All identifiers get substituted before desugaring, such that only pure (unreduced) lambda calculus remains.

`X     ⤳ [[[2 (+Xu)]]]
`(M N) ⤳ [[[1 `M `N]]]
`[M]   ⤳ [[[0 `M]]]

Note that, since bruijn doesn’t have named variables, the translation of variables is done by converting De Bruijn indices to Church numerals.

By encoding De Bruijn indices, the meta encoding resembles the structure of bruijn’s lambda terms exactly, thus making it homoiconic². This also enables modification of indices and it’s why I chose a variation of MSE over other encodings with very minimal self-interpreters³ – they are not mirroring the term’s structure and are not (trivially) traversable or modifiable.

Also, while looking a bit similar, this quoting technique is very different from Lisp’s. Bruijn’s meta encoding uses a tree-like structure of numbers instead of Lisp’s list of symbols.

# example quotations
:test (`0) ([[[2 (+0u)]]])
:test (`[0]) ([[[0 [[[2 (+0u)]]]]]])
:test (`'0') ([[[0 [[[0 [[[0 [[[1 [[[2 (+0u)]]] [[[1 [[[2 (+0u)]]] [[[1 [[[2 (+0u)]]] [[[1 [[[2 (+0u)]]] [[[1 [[[2 (+1u)]]] [[[1 [[[2 (+1u)]]] [[[1 [[[2 (+0u)]]] [[[1 [[[2 (+0u)]]] [[[2 (+2u)]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]])

# quotes are nestable!
:test (``0) ([[[0 [[[0 [[[0 [[[1 [[[2 (+2u)]]] [[[0 [[[0 [[[2 (+0u)]]]]]]]]]]]]]]]]]]]]])
:test (`[0 `0]) ([[[0 [[[1 [[[2 (+0u)]]] [[[0 [[[0 [[[0 [[[1 [[[2 (+2u)]]] [[[0 [[[0 [[[2 (+0u)]]]]]]]]]]]]]]]]]]]]]]]]]]])

MSE is a variation of a standard tagged union data structure. If you’re not familiar with tagged unions in lambda calculus, read this section of one of my previous posts. We can then create some functions for constructing meta terms, checking their type, and extracting their values:

# imports apply to all code snippets in this article
:import std/Logic .
:import std/Pair .

# constructs meta index from unary number
idx [[[[2 3]]]] ⧗ Unary → Meta

# returns true if meta term is index
idx? [0 [true] [[false]] [false]] ⧗ Meta → Bool
 
# extracts unary number from meta index
idx! [0 [0] 0 0] ⧗ Meta → Unary

# constructs meta application from two meta terms
app [[[[[1 4 3]]]]] ⧗ Meta → Meta → Meta

# returns true if meta term is application
app? [0 [false] [[true]] [false]] ⧗ Meta → Bool

# extracts both meta terms from meta application as pair
app! [0 0 [[1 : 0]] 0] ⧗ Meta → (Pair Meta Meta)

# constructs meta abstraction from meta term
abs [[[[0 3]]]] ⧗ Meta → Meta

# returns true if meta term is abstraction
abs? [0 [false] [[false]] [true]] ⧗ Meta → Bool

# extracts meta term from meta abstraction
abs! [0 0 0 [0]] ⧗ Meta → Meta

Passing tests

:test (idx (+0u)) (`0)
:test (idx? `[0]) (false)
:test (idx? `([0] [0])) (false)
:test (idx? `0) (true)
:test (idx! `0) ((+0u))
:test (idx! `3) ((+3u))

:test (app `[[1]] `[0]) (`([[1]] [0]))
:test (app? `[0]) (false)
:test (app? `([0] [0])) (true)
:test (app? `0) (false)
:test (app! `([[1]] [[0]])) (`[[1]] : `[[0]])

:test (abs `0) (`[0])
:test (abs? `[0]) (true)
:test (abs? `([0] [0])) (false)
:test (abs? `0) (false)
:test (abs! `[[0]]) (`[0])

Traversal

Traversing the meta term can be done using a typical depth-first tree folding approach. We pass three callback functions which get called for each respective term type. Recursion is achieved with the y combinator.

:import std/Combinator .

# traverses meta term depth-first
fold y [[[[[rec]]]]] ⧗ (Unary → a) → (a → a → a) → (a → a) → Meta → a
    rec 0 case-idx case-app case-abs
        case-idx 3
        case-app 2 ⋔ (4 3 2 1)
        case-abs 1 ∘ (4 3 2 1)

Example fold usage

We can use the fold function to measure the length of the BLC encoding of a meta term:

:import std/Number/Unary .

# calculates blc length of meta term
length fold idx-cb app-cb abs-cb ⧗ Meta → Unary
    idx-cb add (+2u)
    app-cb (add (+2u)) ∘∘ add
    abs-cb add (+2u)

:test (length `[0]) ((+4u))
:test (length `[[1 1]]) ((+12u))

Similarly, we can convert the meta term to a BLC string directly! Just as John Tromp did for his universal machine, I encode BLC as a Church list of bits, where $0$ is [[1]] and $1$ is [[0]]⁴.

:import std/List .

# true bit
blc1 [[0]] ⧗ LcBit

# false bit
blc0 [[1]] ⧗ LcBit

# converts meta term to blc list
blc fold idx-cb app-cb abs-cb ⧗ Meta → (List LcBit)
    idx-cb [++0 (cons blc1) {}blc0]
    app-cb (cons blc0) ∘ (cons blc1) ∘∘ append
    abs-cb (cons blc0) ∘ (cons blc0)

:test (blc `[0]) (blc0 : (blc0 : (blc1 : {}blc0)))
:test (blc `[[1 1]]) (blc0 : (blc0 : (blc0 : (blc0 : (blc0 : (blc1 : (blc1 : (blc1 : (blc0 : (blc1 : (blc1 : {}blc0)))))))))))

Alpha equivalence

Alpha equivalance determines whether two terms are equivalent after alpha conversion. Alpha conversion renames the colliding and shadowed variables by renaming them respectively or in accordance to another term. For example, in normal lambda calculus $\lambda ab.(a\lambda b.b)$ is alpha equivalent to $\lambda xy.(x\lambda z.z)$ , since $\lambda ab.(a\lambda b.b)\overset{\alpha}{\to}\lambda xy.(x\lambda z.z)$ . With the use of De Bruijn indices, bruijn does not require alpha conversion. Alpha equivalence then comes down to syntactic equivalence.

Therefore, implementing an alpha equivalence function for meta terms is as easy as traversing the terms recursively, checking whether their types are the same at each step, and finally testing whether the De Bruijn indices (meta encoding: Church numerals) are the same:

# returns true if two meta terms are alpha equivalent
α-eq? y [[[rec]]] ⧗ Meta → Meta → Bool
    rec 1 case-idx case-app case-abs
        case-idx [(idx? 1) ⋀? (0 =? (idx! 1))]
        case-app [[(app? 2) ⋀? (app! 2 [[(6 3 1) ⋀? (6 2 0)]])]]
        case-abs [(abs? 1) ⋀? (3 (abs! 1) 0)]

:test (α-eq? `[0 0] `[0 1]) (false)
:test (α-eq? `[0 0] `[0 0]) (true)
:test (α-eq? `α-eq? `α-eq?) (true)

Self-interpretation

As mentioned in the beginning, we now want to create a self-interpreter for the described meta encoding. This self-interpreter should be able to reduce any given encoded lambda calculus term to its normal form (if existing).

The trivial solution would be to just translate the standard beta reduction rules of De Bruijn indexed lambda calculus to bruijn. More specifically, since we don’t want to repeatedly execute the beta reduction until the normal form is reached, we use beta-star reduction that directly finds the normal form (or runs indefinitely if the term doesn’t converge).

Formal beta-star reduction of De Bruijn indexed lambda calculus

We first define the set $\Lambda$ of all valid lambda terms and the set $\Lambda^*$ of all valid normal forms:

$\begin{align*} \Lambda=\ &\{\texttt{λ}M\mid M\in\Lambda\}\\ \cup\ &\{\texttt{(}MN\texttt{)}\mid M\in\Lambda\land N\in\Lambda\}\\ \cup\ &\{\texttt{i}\mid i\in\mathbb{N}_0\}\\ \Lambda^*=\ &\Lambda\setminus\{\texttt{(λ}MN\texttt{)}\mid M\in\Lambda\land N\in\Lambda\} \end{align*}$

Beta-star reduction can be written as the following μ-recursive definitions where $\kappa: \Lambda\times\mathbb{N}_0\to\Lambda$ increments indices depending on its abstraction depth, $\sigma: \Lambda\times\Lambda\times\mathbb{N}_0\to\Lambda$ does the substitutions, and $\beta^*:\Lambda\to\Lambda^*$ combines $\kappa$ and $\sigma$ to achieve beta reduction until convergence:

$\begin{alignat*}{2} \kappa(\texttt{i}, n)&=\texttt{i}&\mid n>i\\ \kappa(\texttt{i}, n)&=\texttt{i+1}&\mid n\leq i\\ \kappa(\texttt{λ}M, n)&=\texttt{λ}\kappa(M, n+1)\\ \kappa(\texttt{(}MN\texttt{)}, n)&=\texttt{(}\kappa(M, n)\kappa(N, n)\texttt{)}\\ \\ \sigma(\texttt{i}, S, n)&=S&\mid i=n\\ \sigma(\texttt{i}, S, n)&=\texttt{i-1}&\mid i>n\\ \sigma(\texttt{i}, S, n)&=\texttt{i}&\mid i<n\\ \sigma(\texttt{λ}M, S, n)&=\texttt{λ}\sigma(M, \kappa(S, 0), n+1)\\ \sigma(\texttt{(}MN\texttt{)}, S, n)&=\texttt{(}\sigma(M, S, n)\sigma(N, S, n)\texttt{)}\\ \\ \beta^*(\texttt{(}MN\texttt{)})&=\beta^*(\sigma(B, N, 0))&\mid\beta^*(M)=\texttt{λ}B\phantom{A}\\ \beta^*(\texttt{(}MN\texttt{)})&=\texttt{(}A\beta^*(N)\texttt{)}&\mid\beta^*(M)=A\phantom{\texttt{λ}B}\\ \beta^*(\texttt{λ}M)&=\texttt{λ}\beta^*(M)\\ \beta^*(\texttt{i})&=\texttt{i} \end{alignat*}$

The mathematical formalization can be translated to bruijn as follows:

# increments De Bruijn indices reaching out of their abstraction depth
κ y [[[rec]]] ⧗ Meta → Unary → Meta
    rec 1 case-idx case-app case-abs
        case-idx [idx ((1 >? 0) 0 ++0)]
        case-app [[app (4 1 2) (4 0 2)]]
        case-abs [abs (3 0 ++1)]

# substitutes a term based on its abstraction depth
σ y [[[[rec]]]] ⧗ Meta → Meta → Unary → Meta
    rec 2 case-idx case-app case-abs
        case-idx [compare-case case-eq case-lt case-gt 1 0]
            case-eq 2
            case-lt idx 0
            case-gt idx --0
        case-app [[app (5 1 3 2) (5 0 3 2)]]
        case-abs [abs (4 0 (κ 2 (+0u)) ++1)]

# reduces a meta term to its normal form (if existing)
β* y [[rec]] ⧗ Meta → Meta
    rec 0 case-idx case-app case-abs
        case-idx idx
        case-app [[3 1 case-idx case-app case-abs]]
            case-idx [app (idx 0) (4 1)]
            case-app [[app (app 1 0) (5 2)]]
            case-abs [4 (σ 0 1 (+0u))]
        case-abs [abs (2 0)]

Passing tests

:test (κ `0 (+1u)) (`0)
:test (κ `0 (+0u)) (`1)
:test (κ `1 (+0u)) (`2)
:test (κ `[0] (+0u)) (`[0])
:test (κ `[1] (+0u)) (`[2])
:test (κ `[2] (+1u)) (`[3])
:test (κ `(1 0) (+1u)) (`(2 0))

:test (σ `0 `2 (+0u)) (`2)
:test (σ `[1] `2 (+0u)) (`[3])
:test (σ `[[4 0]] `2 (+1u)) (`[[3 0]])
:test (σ `[[3] [3]] `0 (+0u)) (`[[2] [2]])

:test (β* `0) (`0)
:test (β* `[0]) (`[0])
:test (β* `([0] [0])) (`[0])
:test (β* `(++(+2u))) (`(+3u))
:test (β* `(β* `(++(+2u)))) (`(+3u))

However, I’m not completely satisfied with β*: If compiled to (optimized) BLC, it’s 4 times the size of Tromp’s self-interpreter ( $\approx120$ B), yet has a higher space/time complexity (e.g. threefold y-recursion). Also, reducing a meta term to a meta term is not as flexible as reducing a meta term directly to a non-meta term.

The basic idea behind a space-efficient self-interpreter is to translate the encoding to an abstract encoding and let the already running reducer do the actual beta reduction (i.e. index shifts and substitution) automatically – metacircularity. For lambda calculus we also need to keep track of all the bindings.

Example OCaml implementation (from Olivier Danvy’s thesis 3.2.4):

type term = IDX of int
          | ABS of term
          | APP of term * term

type value = FUN of (value -> value)

let rec eval (t : term) (e : value list) : value =
  match t with
    IDX n ->
     List.nth e n
  | ABS t' ->
     FUN (fun v -> eval t' (v :: e))
  | APP (t0, t1) ->
     apply (eval t0 e) (eval t1 e)
and apply (FUN f : value) (a : value) =
  f a

let main (t : term) : value =
  eval t []

John Tromp’s implementation⁵ is very similar. Instead of the term type, it uses the BLC encoding. It uses continuations (CPS) to pass and apply its bindings, and fixed-point recursion to call itself recursively. The bindings are represented as a Church list. A cool property of Church lists is that they can be indexed by a chain of [[0]] and [[1]] (which, it turns out, is great for golfing since blc1=[[0]] and blc0=[[1]]!):

lst "abcd"

:test (lst blc0) ('a')
:test (lst blc1 blc0) ('b')
:test (lst blc1 blc1 blc0) ('c')
:test (lst blc1 blc1 blc1 blc0) ('d')

In BLC, an index is identified with 11 and ended by 10. The 11-case therefore pushes blc1 to the binding selector, while the 10-case calls the continuation with [0 blc0]. The initial continuation is the diverging continuation ([0 Ω]) that’s used for open terms (De Bruijn indices reaching out of their environment).

So, to get some inspiration on how actually apply this method to the meta encoding, I translated his universal machine to bruijn:

# reduces a list of blc bits to its normal form lambda term (if existing)
eval-blc ω rec [0 Ω] ⧗ (List LcBit) → a
    rec [[[0 [[[[2 match]]]] (2 2) 1]]]
        match [4 case-0 case-1]
            case-0 2 [1 case-00 case-01]
                case-00 2 [[2 [0 1 2]]]
                case-01 3 [3 [2 0 (1 0)]]
            case-1 0 case-10 case-11
                case-10 1 [0 1]
                case-11 [3 [3 [1 (0 3)]] 4]

:test (length `eval-blc) ((+232u))
:test (eval-blc (blc `[[[1]] [0]])) ([[0]])

This is obviously highly golfed and abstract, so it might take some time to understand⁶. To make understanding easier I also created a (hopefully) more readable translation to Haskell.

Commented Haskell translation

I used John Tromp’s LC paper and his annotated lambda code as resources for the translation.

Since Haskell is typed and BLC is untyped, I needed to do some trickery by abusing recursive types. As a result, every lambda abstraction requires a type wrapper, and every application a special application using the (@) infix. I did my best to reduce golfed lambda logic by introducing verbose variants like cons', head' and tail'.

newtype Term = T (Term -> Term)
type Cont = Term
type LcBit = Term -- BLC 0 = [[1]], 1 = [[0]]
type List = Term -- of LcBit

-- untyped lambda terms
k = T $ \x -> T $ \y -> x
ki = T $ \x -> T $ \y -> y
cons' = T $ \x -> T $ \y -> T $ \f -> (f @ x) @ y
head' = k
tail' = ki
null' = ki
blc0 = k
blc1 = ki

-- untyped lambda application
(@) :: Term -> Term -> Term
T f @ x = f x

-- custom cons infix
(<:>) :: Term -> Term -> Term
x <:> xs = cons' @ x @ xs
infixr 5 <:>

-- diverging term for the end of continuations (e.g. open terms)
omega :: Term
omega = let dup = T $ \x -> x @ x in dup @ dup

evalBlc :: Term
evalBlc = rec @ T (@ omega)
 where
  -- case 00: Abs
  -- prepends a new variable `arg` to bindings list `args`
  -- returns the continuation applied to the decoded term provided
  --   with the new bindings
  abs :: Term -> Cont -> Term
  abs term cont = cont @ T (\args -> T $ \arg -> term @ (cons' @ arg @ args))

  -- case 01: App
  -- calls `rec` again, extracting another decoded term `term2`
  -- returns the continuation applied to the application of the
  --   decoded terms provided with shared bindings
  app :: Term -> Cont -> Term
  app term cont =
    rec @ T (\term2 -> cont @ T (\args -> (term @ args) @ (term2 @ args)))

  -- case 10: Continue
  -- end of index
  -- adds final [[1]] to binding selector
  continue :: Cont -> Term
  continue cont = cont @ head'

  -- case 11: Idx
  -- calls `rec` on `xs` to construct binding selector
  -- extracts a binding selector `var` which is provided with the tail
  --   of `args` of the binding list to obtain the correct selector
  idx :: List -> Cont -> Term
  idx xs cont =
    T $ \list -> rec @ T (\var -> cont @ T (\args -> var @ (tail' @ args))) @ xs

  -- recursive function for continuations
  rec :: Term
  rec = T $ \cont -> T $ \lst -> go (head' @ lst) (tail' @ lst) cont

  -- decides which case to take based on the first two bits
  go :: LcBit -> List -> Cont -> Term
  go bit0 lst cont =
    let bit1  = head' @ lst -- 2nd bit
        xs    = tail' @ lst -- tail of the tail

        case0 = rec @ T (\term -> bit1 @ abs term cont @ app term cont) @ xs
        case1 = (bit1 @ continue cont @ idx xs cont) @ xs
    in  bit0 @ case0 @ case1 -- branch on the first bit

main :: IO ()
main = do
  let x = evalBlc @ (blc0 <:> blc0 <:> blc1 <:> blc0 <:> null')
  -- decoding would be quite complicated
  print 42

Note that this version of interpreter adds an additional abstraction that can be used for monadic IO.

We could combine the blc function with eval-blc to get a meta evaluation function:

# reduces a meta term to its normal form lambda term (if existing)
eval-comp eval-blc ∘ blc ⧗ Meta → a

:test (eval-comp `[[[1]] [0]]) ([[0]])
:test (eval-comp `[(+21u) + (+21u)]) ((+42u))
:test (eval-comp `[eval-comp `[(+21u) + (+21u)]]) ((+42u))
:test (eval-comp (β* `[eval-comp `[(+1u) + (+1u)]])) ((+2u))

The additional blc conversion makes this really inefficient. So, to achieve the final goal of removing the blc translation, we can implement eval directly.

The main idea is to keep the structure of eval-blc, but change the construction of the selection function.

Since we have Church encoded De Bruijn indices, we just need to convert the Church numeral to a selection list! Indexing can then be done with [[head (1 tail 0)]], where (1 tail) replaces all 1s of the numeral with a chain of tail and the outer head subsequently selects the $n$ th element.

The final eval function can then be created. The following is a y-recursion version without the additional abstraction (since we could just hardcode additional abstractions if needed).

# reduces a meta term to its normal form lambda term (if existing)
eval y [[[rec]]] [0 Ω] ⧗ Meta → a
    rec 0 case-idx case-app case-abs
        case-idx [2 [head (1 tail 0)]]
        case-app 2 [3 [3 [2 0 (1 0)]]]
        case-abs 2 [2 [[2 (0 : 1)]]]

# defines the ! prefix as eval
!‣ eval

:test (!`(α-eq? `α-eq? `α-eq?)) (true)
:test (!`((+21u) + (+21u))) ((+42u))

After some minor optimization, the following 194 bit blc code emerges (the code is in the brackets):

01010001                                    00011100
11010000               ######               11100110
10000               ############               00001
01011              #####    #####              00001
11100             ####        ####             00101
01110             ####       #####             00011
00000             ####      ######             10100
00011             ####    ### ####             00111
10000             ####   ##   ####             11111
00001             #### ###    ####             11110
00010             ######      ####             11110
10011             #####       ####             10100
11110             ####        ####             00011
11000              #####    #####              00011
11000               ############               01011
01101110               ######               00011001
00011010                                    00011010

Performance

The efficiency of the eval function obviously depends on the reducer and its reduction technique, for example since some reducers can’t lazily evaluate Church lists like the binding list.

Bruijn, however, uses the RKNL call-by-need reducer and therefore reduces many terms lazily and with memoization. Using bruijn’s REPL :time command, we can do some basic (98% unscientific!) analysis:

:import std/Math .

# factorial of 30
:time (fac (+30))
> "0.2 seconds"
:time !`(fac (+30))
> "1.2 seconds"

# seemingly linear!
:time (fac (+70))
> "4.1 seconds"
:time !`(fac (+70))
> "24.2 seconds"

# nth prime number
:time (primes !! (+20))
> "0.5 seconds"
:time !`(primes !! (+20))
> "8.7 seconds"

# kinda linear!
:time (primes !! (+42))
> "4.0 seconds"
:time !`(primes !! (+42))
> "74.4 seconds"

While it’s obviously much slower than the native reduction, I was actually very surprised when I first saw how fast the meta evaluator is. I wasn’t able to correlate the (seemingly linear) slowdown factor with specific types of terms; it could potentially be decreased in the reducer implementation.

Also interesting:

:time (fac (+30))
> "0.2 seconds"
:time !`(fac (+30))
> "1.2 seconds"
:time !`(!`(fac (+30)))
> "12.0 seconds"
:time !`(!`(!`(fac (+30))))
> "145.6 seconds"

Quasiquotation

If you know Lisp or its dialects, you probably heard of their quasiquote feature. If you quote terms with a backtick (the quasiquote) instead of a normal single quote, you can escape (unquote) the quotation using a comma. The unquoted term then won’t directly be converted to its meta encoding (in Lisp’s case, a symbol), but will be evaluated first and then interpreted as a symbol.

Bruijn defaults to quasiquotation and as such every comma-prefixed term will not be directly meta-encoded. They first get reduced as if they were not quoted at all, and then turned into its meta encoding again.

Unquotes (like quotes) are nestable, so you can requote a term after it has been unquoted.

:test (```,[0]) (``[0])
:test (`,`,[0]) ([0])
:test (`[0 `,[0]]) (`[0 [0]])

Bruijn’s unquote has a special feature though! Instead of turning unquoted De Bruijn indices into their meta encoding, they get shifted to reach out of their meta environment and bind themself to outer abstractions. This shifting makes unquoted indices much more useful.

# adds two using normal quotation
add-two `[0 + (+2u)] ⧗ Meta

# adds two using a shifted De Bruijn index
add-two* [`(,0 + (+2u))] ⧗ Meta → Meta

:test (!add-two (+2u)) ((+4u))
:test (!(add-two* `(+2u))) ((+4u))

# don't quote-nest unquoted indices.

Numeric encoding

This section was added 5 months after publishing.

We can encode meta terms as numbers by using an integer pairing function.

For simplicity and performance I chose the Strong Rosenberg pairing function. The implementation looks as follows:

# strong Rosenberg pairing function
pair-ternary [[[0 ⋅ 0 + 0 + 2 - 1] (max 1 0)]] ⧗ Number → Number → Number

# strong Rosenberg unpairing function
unpair-ternary [[[go] (1 - (0 ⋅ 0))] (sqrt 0)] ⧗ Number → (Pair Number Number)
    go (0 <? 1) (0 : 1) (1 : (1 ⋅ 1 + 1 + 1 - 2))

By pairing the terms recursively with either $0$ , $1$ , or $2$ , we can encode the type of the term alongside the encoding of its body:

# encodes meta term as ternary number
encode fold idx-cb app-cb abs-cb ⧗ Meta → Number
    idx-cb (pair-ternary (+0)) ∘ unary-to-ternary
    app-cb (pair-ternary (+1)) ∘∘ pair-ternary
    abs-cb pair-ternary (+2)

# decodes ternary encoding back to meta term
decode z [[unpair-ternary 0 go]] ⧗ Number → Meta
    go [[(case-idx : (case-app : {}case-abs)) !! 1 0]]
        case-idx idx ∘ ternary-to-unary
        case-app [unpair-ternary 0 (app ⋔ 4)]
        case-abs abs ∘ 3

Examples:

# encoding terms as numbers
:test ((encode `(0 0)) =? (+3)) (true)
:test ((encode `[0]) =? (+8)) (true)

# decoding numbers to terms
:test (decode (+3)) (`(0 0))
:test (decode (+8)) (`[0])

See Helmut Brandl’s and Steve Goguen’s related work for more information.

Self-modification

For the meta encoding to reach its full potential for self-modifying programs, we obviously need some helper functions.

# sets lhs of meta application
lhs [[1 [0] [[[[1 4 2]]]]]] ⧗ Meta → Meta → Meta

:test (lhs `(1 0) `0) (`(0 0))

# sets rhs of meta application
rhs [[1 [0] [[[[1 3 4]]]]]] ⧗ Meta → Meta → Meta

:test (rhs `(0 1) `0) (`(0 0))

# swaps terms of meta application
swap [0 [0] [[[[1 2 3]]]]] ⧗ Meta → Meta

:test (swap `(1 0)) (`(0 1))

# applies a function to the body of a meta term
map [[0 case-idx case-app case-abs]] ⧗ (Meta* → Meta) → Meta → Meta
    case-idx [idx (2 0)]
    case-app [[app (3 1 0)]]
    case-abs [abs (2 0)]

:test (map inc `0) (`1)
:test (map (map inc) `[0]) (`[1])
:test (map swap `[0 1]) (`[1 0])

You can now modify any meta term or meta sub-term!

Usecases

Typical usecases involve macros. People are in love with Lisp’s macro system. Since bruijn doesn’t use modifiable symbols, has no compile-time defmacro, nor can functions have side effects, bruijn’s quoting is not nearly as power-/useful as Lisp’s. It can still be used for basic macros.

With self-modification and self-interpretation, metaprogramming can give you code “factories” that dynamically produce code based on the program’s input or state. If you want to see this escalate, watch Nada Amin’s Strange Loop talk “Programming Should Eat Itself”.

Metaprogramming could also lead to a self-hosting interpreter of bruijn (hint, hint.. ?).

$❦$

Thanks for reading. Contact me via email. Discuss on reddit. Support on Ko-fi. Subscribe on RSS.

You may also recognize it from Justine Tunney’s popular blog post where it was featured alongside other obfuscated functions.↩︎
The misattribution of homoiconicity is strangely often hotly debated. I think it’s generally agreed upon that Lisp’s quotable expressions lead to homoiconicity. As a consequence, I believe, bruijn’s self-reflecting meta encoding also makes it homoiconic.↩︎
Other encodings than normal MSE: Mitchell Wand explored a technique using a different higher order abstract syntax (HOAS) in their paper “The Theory of Fexprs is Trivial”. Aaron Stump improved upon this by applying quotation and evaluation to Wand’s system in their paper “Directly reflective meta-programming”. Mogensen created another encoding that can be self-interpreted using just [0 [0] [0]]. None of these HOAS use Church-encoded De Bruijn indices.↩︎
One could argue that this encoding seems inverted and I’d agree. However, to quote John Tromp after I talked to him about his decision: “BLC represents bit0 as true, because i like to view booleans as \x_0 . \b_1 . x_0 and \x_0 . \x_1 . x_1”. Makes sense.↩︎
I talk about “John Tromp’s implementation”, since he created the presented golfed self interpreter. The general method of using continuations this way was, I believe, first used in John Reynolds Definitional Interpreters.↩︎
Hints for understanding: Read Tromp’s LC paper (page 6+) and (maybe) the papers mentioned in [2].
Hints for translation: Unwind the omega recursion to normal recursion. Rewrite the Church list behavior with pattern matching (e.g. applying a function to a Church list applies the function with the list’s head and tail). Translate the Church boolean logic of BLC bits to normal if-then-else logic. Understand the usage of continuation passing style. Have a look at John’s annotated version.↩︎

Imprint · AI Statement · RSS