Digging into Language Design: Rholang Semantics

by | Dec 18, 2018

I recently gave a hefty start to building the Rholang semantics using K-Framework. I want to share some insights I gained and the battles I faced. Perhaps this will help someone else along their way.

For those unfamiliar with Rholang, it’s a language built for the RChain blockchain. It’s special and a bit strange because it evaluates all computation by sending messages using a set of named channels. Also, all computation happens concurrently, which can introduce some interesting non-determinism. You send messages to channels using processes. The syntax of a program looks something like Proc1 | Proc2 | ... | ProcN, where each ProcK is a process and the | symbol indicates that the processes are running in parallel. Rholang finishes evaluating a program when it’s done sending or receiving all the messages. Programming in Rholang takes a bit of getting used to, but it sure is fun.

For those unfamiliar with K-Framework, it’s a language for defining the semantics of programming languages. You write a syntax module, where all the language syntax goes. And, you write a semantics module, where you tell the Framewfork what each bit of syntax means. In order words, you describe how it should rewrite syntax. With this information, K-Framework makes a compiler for your language, along with some other useful tools that let you prove things about code in your language.

Writing in K appears simple but is deceptively difficult. It boils down to defining syntactic patterns and then telling K how to rewrite them when it sees them. K is special in that it uses matching logic, the mathematical theory that any programming language can be defined by a set of syntactic constructs and rewrite rules. Matching logic says that you can do all computation by defining rewrite rules for any term. Computation, then, is rewriting.

The hope is that, after K applies these rewrite rules to any syntactically correct program, it will correctly evaluate programs—by rewriting terms—in such a way that they behave like we would expect. If we have, for example, Haskell semantics and we want to study and prove things about the behavior of Haskell programs, we would run them using K. If the semantics are correct, the compiler that K produces via these rewrite rules should be behaviorally identical to whatever standard compiler we’re using to run our code in practice.

For example, in Rholang we have the empty process Nil, which does nothing (sends/receives no messages). You might define a syntactic category called Proc (for “process”) with Nil as a syntactic construct in that category. The code would go something like this:

syntax Proc ::= Nil
rule Nil => .K

The first line defines the syntax, telling K that Nil is a valid program. The second line gives Nil meaning via a rewrite rule: any time K encounters a Nil in the program, it rewrites it to .K, the symbol for nothing, an empty program.

Simple enough, right? Not so fast. Programming languages expressed in K can get complicated very quickly. As you go along defining your language, your syntax module gets large and your semantics modules (those containing the rules) grows larger. K-Framework requires a rigorous, coherent theory of sorts to define a language, meaning that rules you write can be applied in any order and should still behave the same. This makes writing code for a sequential process tricky, to say the least.

For example, let’s try to write the syntax and semantics for a simple programming language that compares a set of parallel processes in Rholang against a pattern. In Rholang, parallel processes have the form Proc1 | Proc2 | ... | ProcN where each ProcK is a Rholang process (Nil, for example). A pattern will have the form Patn1 | Patn2 | ... | PatnN where each PatnK is a Rholang pattern that may or may not match a set of parallel processes. A pattern could be a process, or it could include variables. We’ll write this operator as #matches, which returns true if they do match, and false otherwise. In our examples we’ll use an auxiliary function, #pattern-matches, that tells if a particular process ProcK matches a particular pattern PatnK.

If a priori you know how many processes and patterns are in parallel, it’s not hard to write a rule to check if they match. For example, if we’re dealing with three processes and three patterns, our rule might look like this:

rule (Proc1:Proc | Proc2:Proc | Proc3:Proc) #matches (Patn1:Patn | Patn2:Patn | Patn3:Patn)
    => #if Proc1 #pattern-matches Patn1 #then
        #if Proc2 #pattern-matches Patn2 #then
            #if Proc3 #pattern-matches Patn3 #then

Essentially, we go one by one and check them all. To generalize the rule to any number of parallel processes/patterns, clearly we should use some sort of recursion: check the first pattern/process pair, then the second, then the third, etc. For someone like me, with just a pure math background and unfamiliar with many standard programming strategies, this was a nontrivial problem to figure out. My first intuition was to use cells. These are the compartments that K uses to calculate and model things in a language. There is always a k-cell, where the computation happens. There can also be cells for a stack, a heap, or any other number of things. Cells are written in markdown-like tags, such as <k> … </k> . A rule using cells might look like this:

// When we encounter an instance of `#matches`, put in a
// placeholder while computation in the cell happens.
rule  ParallelProcs #matches ParallelPats => #matchPlaceholder ... 
    (.Bag => ParallelProcs ; ParallelPats)

// General comparison step
rule  Proc1 | Procs ; Patn1 | Patns
    => #if Proc1 #pattern-matches Patn1 #then
        // So far so good. Now compare the rest of the processes/patterns.
        Procs ; Patns
        // They don't match.

// If we're only comparing one process/pattern
rule  Proc1 ; Patn1
    => #if Proc1 #pattern-matches Patn1 #then
        // They match!
        // They don't match

// Resume computation, rewriting ParProcs #matches ParPats to the resulting Boolean
rule  #matchPlaceholder => B ... 
    // Delete the matchCompare cell
    (B:Bool  => .Bag)

The first rule grabs the ParallelProcs #matches ParallelPats, puts a placeholder in computation so K doesn’t move forward until we figure out if ParallelProcs and ParallelPats match. The second is the general, recursive step for comparing processes/patterns. The third treats the case that there’s just one processes/pattern, and the fourth resumes computation in the K-cell.

Two things here: First, unlike programming languages like Haskell, recursion doesn’t have beautiful syntax; you’ve got to do it all yourself. Second, if writing rules for #matches requires its own cell and that many rules, how many cells do we need to define something like Haskell, C, or Rholang? The answer: if you do it this way, SO MANY. Too many. Way too many. You want your semantics to be minimalist, simple, and clean. Someone down the line is going to try to use what you wrote to prove things about code. Proving things is hard enough; using messy semantics, even if they’re behaviorally correct, does not help that effort.

When I started the Rholang semantics, my inexperience betrayed me. I got into a mess, using too many cells and rules to perform simple computations. There is a better way to do all of this—one that will make the Rholang semantics beautiful, coherent, and simple. To do so, you have to define strict functions. These are functions that evaluate their arguments before executing.

If you do this, you can significantly simplify the code from before.

rule Proc1 | Procs #matches Patn1 | Patns
    => #if Proc1 #pattern-matches Patn1 #then
        Procs #matches Patns

rule Proc1 #matches Patn1
    => #if Proc1 #pattern-matches Patn1 #then

Easy as that. Well, almost. If we do it this way, we need to force K to evaluate both of these rules in the middle of computation, without putting them into cells to do it. You can do this by using strict functions, which is simple enough. If for example we wanted an if ... then ... statement in our language, where #pattern-matches can be used in the condition, we might define it like follows, with a [strict] attribute.

syntax Proc ::= "if" Bool "then" Proc [strict]
syntax KResult ::= Bool

The first line defines the syntax and the second line, with KResult, tells K that it’s done calculating a strict argument when it sees a boolean. Easy enough, right? That will work like a charm in this case, but again things can get complicated quickly.

In Rholang, the mammoth task is to write a full-fledged pattern-matcher in K that’s sleek—ideally one that uses as few cells as possible to do it. Pattern-matchers are hard! As I mentioned before, I started by using cells. Lost of cells. I did it recursively, sort of like the example before. I never quite finished; there were too many special cases and variable scopes and other things to worry about.

Once I realized that there must be a better way, I started working on a structural type system that would make pattern-matching into a nice, easy-to-state theory that K might be able to use. The trick then was: how do I generate types without using any cells at all? (I suspect Rholang can be defined with only a small handful of cells.) The same recursive problem is there.

I eventually figured out that one could use strict functions to generate types. The function I defined is called #type. It’s a strict identity function, meaning that it will grab an argument, evaluate it, and then spit it out. The rule is simple (notice the [strict] attribute):

rule #type( T:Type ) => T [strict]

There’s some nuance to figure out, though, which is this KResult thing that we mentioned before. How do we tell K that it’s done evaluating its argument before executing the (identity) function #type? If you say KResult should be the syntactic category Type, which is the category of the input, then K won’t do anything and the type system will be worthless. The answer is to define three syntactic categories, InitialType, IntermediateType, and TerminalType. The first triggers K and lets it know to start computing, the second has all the rewrite rules, and the last tells K that it’s done and it can spit out the result. In essence, you let the syntactic category of the input evolve as the strict computation happens to produce something of the final category exactly when the computation is done and we can execute #type.

What comes out of it is really, really cool. You can generate huge types—for statements, terms, or even entire programs—without ever needing a single cell. Here’s the code from configuration.k on my GitHub repo:

imports RHO-SYNTAX

// Configuration
    // Concurrent threads for processes to run
    <thread multiplicity="*">
        <k color="red"> $PGM:Proc </k>
    // The tuple space
    <In multiplicity="*">
        <mailfrom> .K  </mailfrom> // list of names
        <guard>    .K  </guard> // list of patterns
        <body>     .K  </body>
        // Bool to determine whether or not is persistent
        <linear>   .K  </linear>
    <Out multiplicity="*">
        <mailto>   .K  </mailto>  // list of names
        <msg>      .K  </msg>
        // Bool to determine whether or not is persistent
        <single>   .K  </single>

For reference, This only contains about 11 cells, as opposed to, e.g. the EVM semantics which uses about 60 cells. I think the Rholang semantics can stay that minimalist.

Using strict functions even lets you do pattern matching, that behemoth that blew up in my face the first time, without ever using a single cell. What a powerful thing!

If I were to sum up some of the lessons I’ve learned, it’d go something like this:

  1. Keep your syntax module as simple as possible. It pays to take time to really think it out.
  2. Avoid using cells whenever possible, for simplicity and ease of reading.
  3. You can accomplish (2) by carefully choosing how the syntactic categories are set up and defining strict functions where appropriate.

Programming in K is like writing in a language that gives no conveniences, where you yourself have to wrestle with all the subtle nuances of memory management, function strictness, lazy evaluation, variable scope and shadowing, and on and on and on. It’s a rude awakening to the fundamentals of language design that you have always taken for granted. It’s a tough but exhilarating ride I would recommend to anyone!