Tiny, Untyped Monads

Marvin Borner

2024-11-27

Monads are structures commonly used to abstract over the explicit passing of context, thus making programs cleaner and easier to understand. However, their underlying implementation is often described overly complicated or is hidden in a mess of convoluted types and instantiations.

If you go to the roots of a monad, you will find surprisingly simple mechanisms. After experimenting with monads myself in bruijn – an untyped, pure language – I found some beauty in purely functional, yet untyped, implementations that I want to share. As a side effect, the monads become so small that most definitions take only a few characters!

I’ve received some constructive criticism about the inaccessibility of my writing, so here I use a common JavaScript syntax instead of bruijn and try to explain myself better. You can find all the functions as a library in this repository.

(also, none of this should be used in production, this is solely for fun and education)

Tagged Unions

I start with a primer on functional data structures – we want to encode state without using structures like objects or arrays! We also don’t care about any potential type checks.

A data structure can do (at least) two things: Store data, and extract data. For storing, we need to make sure that the data itself, when interpreted as a function, never gets called – otherwise the data may not be recoverable (or it just throws an error). For extracting, we will use arguments that I like to call selectors. We use these selectors in such a way that the stored data gets applied to the respective selector, thus making our data easily extractable.

For example, let’s say we want a data structure that can hold a single item of two different “types” (or tags!) – a person and an animal. We want the structure to give us information on which type of item it currently stores and we want a method to extract the item of the specific type.

We need two different selectors, one for either type, which are basically just functions passed as arguments that we call with the stored value. To store a value as the type “person” or “animal”, we use two different constructor functions with an argument value:

// Person constructor
Person = value => person => animal => person(value)

// Animal constructor
Animal = value => person => animal => animal(value)

// Examples
examplePerson = Person("Lars") // result: person => animal => person("Lars")
exampleAnimal = Animal("Duck") // result: person => animal => animal("Duck")

To find out whether the constructed data is a person or an animal, we can use the selector functions. Since the selector function gets applied to the value inside the structure, we ignore the argument and then return the boolean:

// Person check
isPerson = personOrAnimal => personOrAnimal(_ => true)(_ => false)

// Animal check
isAnimal = personOrAnimal => personOrAnimal(_ => false)(_ => true)

console.log(isPerson(examplePerson)) // true
console.log(isPerson(exampleAnimal)) // false

Extracting the data from the structure is done in a similar way:

// Person extraction
getPerson = person => person(value => value)(_ => _)
//                                  ---------^^^^^^----------
//                                  this argument is ignored!

// Animal extraction
getAnimal = animal => animal(_ => _)(value => value)

console.log(getPerson(examplePerson)) // "Lars"
console.log(getAnimal(exampleAnimal)) // "Duck"

This is of course a very minimal example that can be extended arbitrarily. For example, let’s say that a person has multiple properties, like a name and an age. We could now also extend the animal constructor in order to maintain the symmetry, or keep it and modify the functions as follows:

// Person constructor
Person = name => age => person => animal => person(name)(age)

// Person extraction
isPerson = personOrAnimal => personOrAnimal(_ => _ => true)(_ => false)
//                                        --^-  -^-    -----^------
//                                        name  age    animal value

// Person extraction
getPersonName = person => person(name => age => name)(_ => _)
getPersonAge  = person => person(name => age => age)(_ => _)

// And similarly, adapting the ignored argument count:
Animal    = value => person => animal => animal(value)
isAnimal  = personOrAnimal => personOrAnimal(_ => _ => false)(_ => true)
getAnimal = animal => animal(_ => _ => _)(value => value)

I hope you can see the elegancy and power of this encoding. In fact, many formally studied encodings of pure lambda calculus (which is basically the above – a bunch of anonymous functions) come down to this exact principle of using multiple arguments and a selector function!

For example:

// Church pair constructor
// Single tag: Selector (s)
ChurchPair = a => b => s => s(a)(b)
//           ^----^   -^----^-
//           values   selector

// Church numeral
// Two tags: Successor (s) and Zero (z)
// The stored value is in z via a composition of selectors
churchThree = s => z => s(s(s(z)))
//           -^----^-
//           selector

// Scott numeral
// Two tags: Successor (s) and Zero (z)
// The stored value is another Scott numeral!
scottThree = s1 => z1 => s1(s2 => z2 => s2(s3 => z3 => s3(s4 => z4 => z4)))
//           -^----^-
//           selector

Maybe

The Maybe monad is very common and appears in most modern languages in some way or another, sometimes with the name Option. It can store either one or zero elements and supports checks for whether an item is stored or not.

For example, if a division of two numbers typically returns another number, you can modify its return type to a Maybe, such that it includes the special case of dividing by zero, where the function should not return anything. These two different states are typically called “Just” and “Nothing”, or “Some” and “None”:

// Returns a Maybe: Either "Nothing" or "Just(<value>)"
function divide(a, b) {
    if (b === 0)
        return Nothing
    return Just(a / b)
}

We can encode this structure as tagged unions!

Nothing   = nothing => just => nothing
Just = v => nothing => just => just(v)
//   --^--                   --^^^^--
//   value                   selector

isNothing = maybe => maybe(true)(_ => false)
isJust    = maybe => maybe(false)(_ => true)
//                          ------^------
//                          ignored value

getValue = just => just()(v => v)

// or even:
prettyMaybe = maybe => maybe("Nothing")(v => "Just " + v)

console.log(prettyMaybe(Nothing)) // "Nothing"
console.log(prettyMaybe(Just(42))) // "Just 42"

We could now already work with the division function from above:

result = divide(42, 0)
if (isNothing(result))
    console.error("OH NO!")
else
    console.log(getValue(result))

When using monads this way, you would need many, potentially nested, if statements. However, this is where the magic of monads would normally set in! Exactly this ugly, explicit tracking of state (here “is nothing”) is what monads try to eliminate.

If JavaScript had native support for monads, the syntax for consecutive actions may look like this:

input = +prompt("Enter a number!")
do {
    a <- divide(42, input)
    b <- divide(42, a)
    c <- divide(b, a)
    return(c)
} // either Nothing or Just(c)

Within the do, the actions get chained together and stored in variables. However, while divide returns a Maybe monad, the individual variables are in fact numbers! The numbers are automatically extracted from Just and then put into the next statement. If any division would return Nothing, the entire chain would break and return Nothing.

This behavior inbetween actions such as divide is defined by the bind operation. The final return (hereafter called unit, because of name conflicts) again wraps the value in a monad (here via Just).1

Specifically, the bind function does two things: Extract the value of the monad (if existing), and apply it to a given function. In typed languages this function is then required to return a monad again, such that the final result of the bind is always a monad.

unit = Just

// maybe is the Maybe monad, f is the function
bind = maybe => f => {
    if (isNothing(maybe)) return Nothing
    else return f(getValue(maybe))
}

This is not functional enough! Instead, try to understand why the following definition is equivalent:

bind = maybe => f => maybe(maybe)(f)

The previous do {} block can now be translated using multiple binds, which is basically what functional programming languages with such syntax desugar to as well:

input = +prompt("Enter a number!")

bind(divide(42, input))(a =>
bind(divide(42, a))(b =>
bind(divide(b, a))(c =>
unit(c))))

Either

The Either monad is very similar to the Maybe monad. Instead of storing either nothing or a value, the Either monad stores two different values with either the tag “Left” or “Right”. For example, following the example from above, let’s say our divide function should actually return an error message in the division-by-zero case instead of returning nothing:

// Returns an Either: Either "Left(<value>)" or "Just(<value>)"
function divide(a, b) {
    if (b === 0)
        return Left("Error: division by zero")
    return Right(a / b)
}

The required functions should now be fairly obvious:

Left  = v => left => right => left(v)
Right = v => left => right => right(v)
//    --^--                  -^^^^^--
//    value                  selector

isLeft  = either => either(_ => true)(_ => false)
isRight = either => either(_ => false)(_ => true)
//                         ^-----------^
//                         ignored value

getLeft  = left  => left(v => v)()
getRight = right => right()(v => v)

// or even:
prettyEither = either => either(v => "Left " + v)(v => "Right " + v)

console.log(prettyEither(Left("error"))) // "Left error"
console.log(prettyEither(Right(42))) // "Right 42"

Chaining the Eithers monadically should work by only passing values tagged as “Right”, while the “Left” case is returned immediately:

// returned values should be tagged as "Right"
unit = Right

// either is the Either monad, f is the function
bind = either => f => {
    if (isLeft(either)) return either
    else return f(getRight(either))
}

// or, minified:
bind = either => f => either(Left)(f)

Nested binds will now work as intended:

input = +prompt("Enter a number!")

bind(divide(42, input))(a =>
bind(divide(42, a))(b =>
bind(divide(b, a))(c =>
unit(c))))

// input=42 => Right(42)
// input=0  => Left("Error: division by zero")

Syntax

Before we continue with more complex monads, I want to introduce a nicer syntax.

As you’ve probably noticed, writing bind all the time becomes annoying and unreadable. That’s why most languages with monads have a syntax such as the do notation shown above. Inspired by Magnus Tovslid’s work, I’ve implemented something similar using JavaScript’s generators:

DO = (unit, bind) => f => {
    const gen = f()
    const next = acc => {
        const {done, value} = gen.next(acc)
        return done ? unit(value) : bind(value)(next)
    }
    return next()
}

// specific DO instance for Either:
doEither = DO(unit, bind)

With this, we can use a slightly weird, but much more readable do notation:

input = 42
result = doEither(function* () {
    a = yield divide(42, input)
    b = yield divide(42, a),
    c = yield divide(b, a),
    return c;
})
console.log(prettyEither(result)) // "Right 42"

State

Now onto the more complex monads!

A State monad is useful if you don’t want to use mutable state but still want an elegant way of attaching state to your functions.

Let’s say you have a seeded random number generator and you want to generate three consecutive random numbers. With mutable state, this could be solved like this:

rng = max => seed => (1103515245 * seed + 12345) % max

seed = 161
rand = () => (seed = rng(1000)(seed), seed)

console.log([rand(), rand(), rand()]) // [790, 895, 620]

Instead, we construct a bind that connects the rand calls by “mutating” the state automatically.

The State consists of two values, the current state (of course), and another additional variable that can be used for things like accumulating data – otherwise, chained actions could only ever pass the (albeit modified) state. We store these two values in a Church pair, which is nothing more than a tag (selector) with two values.

State = v => st => s => s(st)(v)
//    --^-- -^^-- -^----^-
//    value state selector

In bind, the function in the first argument (e.g. a state transformer like rand) is then applied to the current state and produces a new state alongside a value. The value and new state is then passed to the next function in the binding chain (the second argument of bind).

/*
 * Pseudocode:
 * bind = run => f => s0 => {
 *    (s1, v) = run(s0)
 *    return f(v)(s1)
 * }
 */

bind = run => f => s0 => run(s0)(s1 => v => f(v)(s1))
//                             --^^^^^^^^^^--
//                             uncurry Church

unit = State

// specific DO instance for State:
doState = DO(unit, bind)

We can now translate the previous example to use monadic state:

// initialize rand with a State tuple with two initial numbers
rand = seed => (g => State(g)(g))(rng(1000)(seed))

threeNumbers =
    bind(rand)(a =>
    bind(rand)(b =>
    bind(rand)(c =>
    unit([a, b, c]))))

// or, simply:
threeNumbers = doState(function* () {
    const a = yield rand
    const b = yield rand
    const c = yield rand
    return [a, b, c]
})

console.log(threeNumbers(161)(st => v => v)) // [790, 895, 620]
//                      -^^^  ^^^^^^^^^^^^
//                      seed    selector

State monads are not only useful for generating random numbers, though. For example, it’s easy to derive a kind of “Writer” monad, that can accumulate logs while working with other data in parallel:

log = (a, str) => st => s => s(st + str)(a)

deepthought = doState(function* () {
    const answer = yield log(42, "Finding answer... ")
    const correct = yield log(answer == 42, "Checking answer... ")
    if (correct) yield log(null, "Is correct!") 
    else yield log(null, "Is false!")
    return answer
})

console.log(deepthought("")(log => answer => ({answer, log})))
// { answer: 42, log: 'Finding answer... Checking answer... Is correct!' }

You can find some more advanced examples using get, put, modify, ap, and fmap on GitHub.

IO

Pure languages often use monads for IO, since tracking the inputs and outputs purely would otherwise result in horribly confusing code. However, the internals of typical implementations are also described as “deeply magical”2, so I show a slightly simplified version.

The IO Monad doesn’t normally make sense in a language like JavaScript, where side effects like prompt, alert, or console.log can be triggered from anywhere. Still, having all impure code (the IO magic) in a single place allows us to change the entire essence of our side effects with very small changes, for example if we want to support several kinds of IO.

Let’s say our IO is based on being able to read and write single characters. We can model the monad as an extension to the State monad:

doIO = doState

// here we could also add file opening, reading, writing, etc.
read =        st => s => s(st)(st.read())
write = ch => st => s => s(st)(st.write(ch))
//                     ----^^--^^----
//                     obj of effects

The state here is an object of available IO effects. The value that’s carried along the bind is the result of calling either one of them. There are many different solutions of using this state. We could, for example, track all the IO calls purely, and then lazily evaluate them all in one go at the end. Since JavaScript isn’t lazily evaluated, it makes more sense to call the impure IO effects immediately.

Via recursion, we can then define actions for reading and writing entire lines:

writeLine = str => doIO(function* () {
    const head = str[0]
    const tail = str.slice(1)

    yield write(head)
    yield tail === "" ? write('\n') : writeLine(tail)
})

// I don't think doIO via yield is powerful enough for this(?)
readLine = bind(read)(ch =>
    ch === '\r' ? unit("")
                : bind(readLine)(line => unit(ch + line))
)

Example usage:

Person = name => age => person => person(name)(age)

constructPerson = doIO(function* () {
    yield writeLine("Please enter your name!")
    const name = yield readLine
    yield writeLine(`Hello ${name}! Now please enter your age.`)
    const age = yield readLine
    return Person(name)(age) // arbitrary data!
})

Finally, when executing IO actions, we need to pass the initial state that contains the impure IO effects:

// cli effects for nodeJS:
// Note the modularity and how you could swap effects arbitrarily!
nodeEffects = () => {
    fs = require("fs")
    process.stdin.setRawMode(true)
    buffer = Buffer.alloc(1)
    fd = fs.openSync("/dev/tty", "rs")
    return {
        write: process.stdout.write.bind(process.stdout),
        read: () => {
            fs.readSync(fd, buffer, 0, 1)
            return buffer.toString("utf8")
        }
    }
}

console.log(constructPerson(nodeEffects())(st => v => v(
    name => age => `Person(name: ${name}, age: ${age})`
)));

// output: "Please enter your name!"
// input: "Marvin"
// output: "Hello Marvin! Now please enter your age."
// input: "21"
// log: "Person(name: Marvin, age: 21)"

Parser

Parsing is another topic where monads come up a lot. If you want to parse a language or some other data, you will need to keep track of the already parsed data as well as the remaining unparsed data. Mutable state makes this seem trivial at first, but at the latest when you descend recursively into a structure with multiple paths, you will probably need to consider backtracking (which could involve rolling back all mutated state to a previous version).

Tracking such data immutably via a parser monad can lead to an elegant yet powerful coding style – for example using parser combinators, where the different parsing steps can be combined modularly via single (often infix) functions.

A minimal parser monad can be constructed as follows: Since our parser can fail, we store all data in an Either monad. Its left case contains an error (for us: a string), and the right case contains a (Church) pair of the already parsed data and the rest of the unparsed input.

Left  = v => left => right => left(v)
Right = v => left => right => right(v)
eitherBind = either => f => either(Left)(f)

fail = err => s => Left(err)

The monadic bind then consists of applying the parser p to the input s. If the result is Right, a parsing function f should be applied to the already parsed data and the unparsed rest of the input:

bind = p => f => s => eitherBind(p(s))(right => right(cur => rst => f(cur)(rst)))

unit = cur => rst => Right(s => s(cur)(rst))

doParse = DO(unit, bind)

For parsing something based on a specific predicate, we can define a function satisfy:

satisfy = pred => s => {
    if (s === "") return Left("end of input")
    const head = s[0]
    const tail = s.slice(1)
    return pred(head) ? Right(s => s(head)(tail))
                      : Left("unexpected " + head)
}

Using this, we can parse a single char as well as a complete string:

char = ch => satisfy(c => c == ch)

string = str => doParse(function* () {
    const head = str[0]
    const tail = str.slice(1)
    yield char(head)
    return yield tail === "" ? unit(str)
                             : bind(string(tail))(_ => unit(str))
})

Then, let’s say the input to the parser is a string “Hello, World!”. We can run several parsers on it:

prettyParser = either =>
    either(v => "Error: " + v)(v => v(cur => rst => ({ cur, rst })))

input = "Hello, World!"

parser = char('H')
console.log(prettyParser(parser(input)));
// { cur: 'H', rst: 'ello, World!' }

parser = char('h')
console.log(prettyParser(parser(input)));
// Error: unexpected H

parser = doParse(function* () {
    const p = yield string("Hello")
    yield char(',')
    yield char(' ')
    return p
})
console.log(prettyParser(parser(input)))
// { cur: 'Hello', rst: 'World!' }

console.log(prettyParser(parser("Hallo, Welt!")))
// Error: unexpected a

Now, of course this is really only scratching the surface of what parsers are actually supposed to do – it should still serve as a good starting point though. You can find some further definitions based on the same idea in bruijn’s standard library.

That’s everything for now, thanks for reading. Contact me via email. Support on Ko-fi. Subscribe on RSS. Follow on Mastodon. Program in bruijn.

Also, I’ll be at 38c3 in Hamburg next month giving a talk on large numbers, let me know if you want to meet (DECT 5063).


Imprint · AI Statement · RSS