Ferl and a model for cluster based programming
So for he past 25 years, I've been playing around with the idea of programs that live in the network. Rather than being tied to a specific process or CPU, the program exists as a coherent set of state changes that can move from computer to computer throughout it's lifecycle. This concept offers the ability to explore need types of behavior.
Ferl is the latest incarnation of this concept. Built on top of the Erlang VM, Ferl is a dual stack machine based language in the spirit of a classic Forth. From an implementation standpoint, the Ferl VM consists of a core set of standard operators: arithmetic, logical, stack, flow, defining, and network. While the first five categories are familiar territory, the network operations enhance the behavior across multiple machines:
- -> machine ( goto machine )
- <- ( goto machine on top of stack )
- >> machine function (fork to machine )
- << function ( fork to machine on top of stack )
- > [ machine list ] ( goto to multiple machines
- < ( goto to multiple machines on top of stack )
- ?> machine ( conditional goto machine )
- ( conditional goto machine on top of stack )
These operators transfer or replicate a program to zero or more machines, and continue operation there. The simple -> machine goto is a network based jump. Rather from moving the instruction pointer from one address to another, it moves the entire program from one machine to another. A corresponding conditional goto machine, ?>, is the same as a conditional jump. If the top of the data stack is non-zero (aka true), the evaluation of the program will migrate from one machine to another.
The fork to machine operation is the equivalent of the Erlang spawn/4 method, wherein the current state of the program is transferred to the specified machine, and the program resumes at the specified definition. Meanwhile, on the original machine, the program continues as is nothing happened. This allows a program to efficiently distribute instances of itself across a network to perform a series of different or similar tasks.
The parallel goto > works much like >> fork operator except all machines resume processing at the same point. This allows a program to distribute itself across a network and work in parallel. This makes programming map reduce style programs trivial, as a program can be written to naturally perform the same reduction across an entire network.
It should be important to note that none of these operations perform any sort of collection of related processes. While Ferl has a concept of spawning a thread, you can't thread join. The reason for this lies in fundamental concept of a program. Once a program forks a process, the system treats each as first class citizens. If two programs need to communicate they do so through message sends. In Ferl the Forth memory access operators @ and ! have been repurposed to mean receive and send. The receive @ operator reads a message from the process inbox and places it on the stack. Similarly, the send ! operator, sends the next value on the stack to the process id on the top of the stack. To acquire the process id of the current process, the $ operator pushes the process id
For example, if a program wants to receive data from a set of child processes, it might use:
Where the main process places it's pid on top of the stack and then spawns on 3 machines (slave1, slave2, and slave3), and then by inspecting the current pid vs the on placed on the stack before the split either waits for messages (if the original) or does work. The wait-for-slaves and do-work methods are user defined methods of the program.
The list constructor is used to declare an inline list, which exists as a single object on top of the stack. Ferl also supports JSONesque property lists as well as lists, which too only occupy a single stack slot. Lists and objects can be dereferenced using the the . operator. List and object mutation is currently not allowed in Ferl, but may be in future versions. However, new objects and lists can be created through concatenation of other lists using the ++ operator. The reason for this largely lies in the premise that most processing should occur on the stack, and that the state of the program must be easily transferable from one machine to another. As such it is easier to guarantee the behavior of a list of object if we need not worry about the details of how it is laid out in memory.
For real world applications, it is often necessary to manipulate local state. In fact, a large part of the reason Ferl exists is that it is often more efficient to distribute a program to where the data is stored than it is to move the data to the program! When working with such data, Ferl allows for access to local Erlang methods and NIFs using the module:function/arity symbolic reference. The top values of the stack will be passed to the function in order, with the top of stack being the first parameter, and so forth. Using these methods a program may access local data, network, etc.
Playing around with Ferl's programming model has definitely shifted my perspective on Erlang. One of the core concepts that changed for me was locality. With the ability to transfer the state machine from machine to machine, the data sets stored on each machine could be radically different. I could decompose data sets into series and collections of mnesia databases. For example, an index table could become a router, providing a list of nodes to visit to acquire a complete dataset. Data could be sparsely scattered over multiple clusters, and the program would visit each one. With the parallel dispatch the notion of scatter / gather programming feels natural but the need to supply a query engine to every node seems wrong. In fact, the entire notion of a database has started to feel off. Sending programs to inspect large memory structures seems more natural and sensible. Adding results to a database simply means prepending to the index w/ a short circuit in the program.
There is a lot more room to explore in this strange country.