Its been a long time since I’ve written anything, we just started a pivot at Leaf so things have been pretty crazy. But like any radical change, there have been a ton of new things to explore and learn from as well, so I suppose there’s a silver lining in there.

As we’re converting our code over to a new platform we’ve switched persistence from Kafka to Postgres which means there’s a whole world of serialization code we need to write & test. Normally SQL databases are difficult to unit test and can only be tested by running integration tests that rely on programs running outside the scope of your project. For testing behavior, I still can’t think of a better way but a colleauge’s off-hand remark that its impossible to test any of the SQL code got me thinking on my commute home.

First off, what would the requirements be for testing SQL? What can you reasonably test given some code & the DDL for schema creation?

  1. Needs to ensure column names are serialized correctly
  2. Can’t rely on any files outside of the project

That seems like a manageable set of requirements to start with. To start with, since the only thing I’m comparing are the column names, lets take a look at an example file:

--comment comment

create table tables (
  id int identity not null,
  label varchar(15) not null,
  location int not null
)

create table locations(
  id int identity not null,
  name varchar(15) not null,
  owner varchar(50) not null
)

-- more comments

Anyone would agree this is a really simple SQL schema, but does that mean it’s really simple to parse? To help answer this question lets investigate the basic grammar for that SQL statement. I’m going to drop the < and > around all of the identifiers because I think they clutter the grammar.

DDL         ::= Statements | Gap
Statements  ::= Statement | Gap Statement Statements
Gap         ::= Whitespace | Newline
Statement   ::= Operator "("  Arguments ")"
Operator    ::= "create table"
Arguments   ::= Arg | Arg "," Arguments
Arg         ::= Name DataType Nullable
Name        ::= text_literal
DataType    ::= Category | Category "(" Size ")"
Category    ::= "int" | "varchar"
Nullable    ::= "null" | "not null"
Whitespace  ::= Space | Tab
Space       ::= " "
Tab         ::= "\t"
NewLine     ::= "\n" | "\r"

This grammar says that a DDL file consists of 0 or more Statements, each separated by a gap of whitespace or a line break. Each statement is composed of one or more arguments, which maps directly onto our intuition that a schema consists of one or more tables, each made up of one or more columns. This isomorphism is valuable for understanding how exactly we’ll parse this statement into something we can use to help with our unit testing. To start with, I’ll need an object that mimics what I want for my column name testing and some of the tokens I’ll be looking for.

case class tbl(name: String, columns: Seq[String])

val Return = "\r"
val NewLine = "\n"
val Comma = ","
val Space = " "
val CreateTable = "table"

I chose to represent CreateTable as just the word table rather than create table as an optimization that I’ll delve into later.

Now that the tokens are in place, its time to start building the elements of my SQL DDL grammar. Parboiled2, the excellent Scala parsing library by Mathias Doenitz & co. allows me to write code that closely mirrors the BNF grammar, which means as you write your grammar you can actually test it out real-time & the code should read just like the grammar. Pretty cool stuff, now lets write some code!

case class DdlParser(input: ParserInput, columnStart: String) extends Parser {
    def DDL           = rule { Statements.*  }
    def Statements    = rule { Ignore ~ Table }
    def Table         = rule { TableFlag ~ TableName ~ Ignore ~ Arguments ~> tbl }
    def TableName     = rule { capture(!EndName ~ ANY).+ ~> (_.mkString("")) ~ EndName}
    def Arguments     = rule { Arg.*.separatedBy(Ignore) }
    def Arg           = rule { columnStart ~ capture(!Space ~ ANY).+ ~>(_.mkString("")) ~ Space}
    def TableFlag     = rule { CreateTable ~ Space }
    def EndName       = rule { Space | "(" }
    def Ignore        = rule { (! (CreateTable | Space ~ Space)  ~ ANY).+ }
  }

It’s a fair bit uglier than a real BNF grammar definition, but given that we need to define functions & chain calls together Parboiled2 is a great step in the right direction. So what’s going on here? Lets break this down line by line.

The DdlParser case class extends Parser which is the base class of all Parboiled2 parsers & requires a ParserInput val named input. Since this is a case class, you probably inferred that each time a document is parsed a new parser is created. Thankfully these are lightweight objects.

DDL is the top level rule just like in the grammar itself. The rule macro is called with Statements.* in its scope. This is a dense line, but under the covers it’s generating a parser rule that matches on 0-n Statements as a sequence. Statements in turn are just an ignored range and a Statement (that’s what the ~ combinator means). The parser continues walking down the DDL parse tree passing over elements (we haven’t added anything to the stack yet) until it reaches the TableName rule in Table where you’ll notice capture(!EndName ~ ANY).+. This statement pushes characters onto the stack until it reaches EndName, which triggers a parser failure & backtracking to its parent rule, TableName. Once one or more characters have been pushed, its time to turn them into a string using Parboiled2’s ~> action combinator, which is a function of the prior rule’s result/side effects (i.e. elements pushed onto stack). In the DdlParser we want the full table name so it’s simply turning the sequence of characters pushed into a string.

After a table name has been matched & pushed onto the stack as a string, the parser will begin processing Arguments, or column names in this example. Each Arg is preceded by some separator that’s been passed into the parser at construction time. In our case this is two spaces, “ “. So, just like we did with the table name, the parser will push column name characters onto the stack until it reaches a terminating space then transform those characters into a string & push it back onto the stack. Once the arguments have been parsed & pushed, the stack should look like this:

[TOP OF STACK]  Vector(id, label, location)
[  Next Cell ]  tables

Parboiled2 has a great feature where when data fits a case class, as this does tbl, you can use an action rule to transform the stack elements directly into the case class, which is what’s happening in Table! Once the parser has walked through all the provided characters it returns a Try[T] with your expected result or a ParserError indicating where you went wrong.

Hopefully after reading this & playing around with the code you’re as excited about what Parboiled2 can do! (I’m not going to show you how to test your serialization code)

Microservices: Unix-style toolchains for the Internet age

A very good question if you ask me. Like monads, it seems that most everyone has a slightly different understanding of what a microservice actually is. So, while I’m going to stay out of the monad-definition cottage industry for now, I think its important to explore the overloaded & overly-buzzwordy world of microservices.

Wikipedia defines microservices as:

… microservices is a software architecture design pattern in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are small, highly decoupled, and focus on doing a small task.

Like any good buzzword definition, this one is littered with other confusing or ambiguous terms. What exactly is a “software architecture design pattern” in this context? We’ve all read the GoF design patterns book but I’m pretty sure we never saw anything about microservices or SingletonVisitorMicroserviceFactory in there, so is this the same type of pattern? In this context I’m taking the term design pattern to mean more of a design philosophy about decoupling software. That fits nicely with the language-agnostic APIs, which is also confusing but I’m pretty sure it just means REST, HTTP, Thrift, or some other custom communication protocol. The second sentence is significantly clearer and offers a recipe for creating a microservice: keep it small, decoupled & focused. Perfect!

So is this a microservice?

object MyFirstService {
	
	def strToFoo(s: String): Future[Foo] = ???

}

Its small, decoupled, communicates via raw strings (assuming its fronted by some REST interface) & certainly focuses on doing just one small task. Unfortunately no, I don’t think this is a microservie, or if it is then it has a truly awful API. This is really just a single function exposed to the world, which means it’s either a God function that does everything or its too narrowly focused, and either way I think that means this isn’t a real micro service.

I think we need some more view points, so lets turn to Martin Fowler & the good folks at Thoughtworks for their take on microservices:

The term “Microservice Architecture” has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

This brief definition doesn’t to his full article justice, but at its core it shows that micro services help manage complexity in some areas while lifting a significant amount of complexity from a classic 3-tier monolith’s code into the operations/ Dev-Ops level. I think this is a spectacular step in the right direction, but I take umbrage with the idea that microservices are a relatively recent phenomenon.

I don’t pretend to be a software historian, but I do happen to know that Unix & Linux are built on top of a very old & very specialized set of tools. The -nix tool-chain adheres to the “do one thing well” philosophy & often exposes their results as a standardized output buffer. These outputs coupled with the venerable operator allow a collection of specialized tools to be composed together in a clear and logical manner in order to accomplish a larger goal. These specific tools have all been around for ~30 years, and the philosophy goes back even further.

And that hits on my own definition of microservices, which is roughly equivalent to Martins but hopefully much clearer:

Microservices are internet-connected tools adhering to the Unix “do one thing well” philosophy & sharing a common input/output language

Flying back from a nice 2 week vacation in Texas & I had a chance to think about primative operators. Specifically in this post I’ll be exploring what a primative opeartor actually is and whether there’s such a thing as “more powerful” primatives.

To get started I’ll be working with mapa function we all hold near and dear. Map’s signature takes a value of some type A and a function A => B. So the full signature would be:

def map[A, B](x: A)(f: A => B) : B

This function has as it’s range and domain all possible morphisms across two types. For instance, based on signature alone map can transform an Int into a Dinosaur via some magical function f(i: Int). However, what it cannot do is transform an Int and a String into a Dinosaur. You could accomplish this via a Tuple[Int, String] and an f(t: (Int, String)) but because you are still acting on a single parameter you’re unable to partially apply map to the tuple. Really this means we’ve been forced into evaluating all or none of the parameters with no ability to choose between them. That seems like an unnecessary constraint for a morphism.

This suggests there exists some more “powerful” version of map that allows us to partially apply across multiple parameters and evaluate them independently. For lack of a better name lets call this map2. Map2 accepts two types of parameters & returns a third, so we know it’s signature looks like this:

def map2[A,B,C](a: A, b: B)(f: (A, B) => C): C

Map2 is able to handle anything map can do by providing a no-op type as B’s type argument. For example, here’s map implemented via map2:

def map[A, B](a: A)(f: A => B): B = 
	map2(a, NoOp)((x, y) => f(x))

This on it’s own buys us alot of power because we no longer need to worry about actually applying the computation for the morphism in map, instead we can defer it into map2. The next question I have about Map2 is whether we can create a function Map3 that is again more powerful than Map2? Logically it would seem that since Map2’s domain and range are strictly supersets of the domain and range for Map that it holds Map2 should be a subset of Map3, right?

It turns out this isn’t quite the case on the implemetation level. You can actually implement all other arity mapping functions in terms of Map2 by using a trick simmilar to have a right fold operates. That is to say via building up a chain of function calls to provide the proper arity. Here’s Map3:

def map3[A, B, C, D](a: A, b: B, c: C)(f: (A, B, C) => D): D = 
	map2(a, b)((x, y) => map2(c, NoOp)(f(a, b, c)))

This ends up creating a chain of map2 calls that can be composed together as f • g • … to generate the full chain.

As I think about it, it is actually possible to use the same approach to construct a similarly deep function call chain for map, allowing us to use map as the base primative and implement all other arity map’s in terms of the base primitive.

With standard Map it’s still impossible to use a single function call to support >1 parameter, where as map with arity > 1 inherently supports this

I’ll update this post once I’ve had a chance to do some more research on which of the two functions is actually the base primitive..

In my last post about state machines I talked about how they can help provide simplifying assumptions for modeling your applications. However, all the simplifying assumptions aren’t very helpful if you can’t actually change your data. In this post I’ll discuss guarded transitions & how to model them in Scala.

The Basics

A basic state machine, especially the DFA we modeled in the previous post, transitions directly from the current state to the new state upon an input. However, how do we model this if we need to perform a calculation/ operation that may fail during the transition from state A -> B? The way I see it, you have to options:

  • Hide the side effect in the transition and make the transition synchronous.
  • Introduce an intermediate state B’ representing the intent or attempt to transition from A -> B.

In the first option, there’s a big advantage in keeping your conceptual model clean. This makes it easier for others to consume. Unfortunately, as soon as you introduce guarded transitions this conceptual “clarity” actually turns into a muddy pool of non-determinism. This is because as soon as the possibility of failure is introduced you have the possibility that a transition from A -> B is no longer a DFA, but an NDFA (Non-Deterministic Finite Automata) because if the transition fails, then you’ll actually wind up back in A (or a failure state), otherwise the machine transitions to B. As you can imagine, this is more difficult to reason about. In a Scala implementation this also means you don’t know that the transition was requested in the event of a complete failure.

The alternative approach using an intermediate state means your diagram becomes much more complicated, with each intermediate state that may fail protected by a guard state. However daunting this pattern may seem initially, it’s actually incredibly easy to reason about so long as developers all understand the pattern (just like any other design pattern). The major advantages here are that the FSM is always a DFA, i.e. you always know exactly the state you’re in. If there is input that triggers a guarded transition it immediately transitions into the guard state, then returns. Yup, this means that you’re always transitioning immediately then handling the forward/backward transition in a callback. The only allowed transitions from B’ should be to A on failure or B on success.

A Simple Scala Example

Bank accounts. Most of us have them (some prefer a shoe-box under the mattress though), and all of us have probably either used one as an example or read examples using them before so the mechanics should be pretty straight forward.

To start with, lets think about what we’ll need to model a DFA. To start with, we need the transition table and a domain to work with:

trait DFAState

trait Result
trait Request

trait DFA[T, Input] {
	
	def transition(in: Input)(state: T): (Result, T)

}

trait TranRequest{
  val amount: Int
}

case class Credited(amount: Int) extends Result
case class Debited(amount: Int) extends Result
case class OverdrawApproved(amount: Int) extends Result
case class OverdrawRejected(amount: Int) extends Result

case class ApproveOverdraw(amount: Int) extends Request with TranRequest
case class RejectOverdraw(amount: Int) extends Result with TranRequest
case class Credit(amount: Int) extends Request with TranRequest
case class Debit(amount: Int) extends Result with TranRequest

trait AccountState extends DFAState
case object Open extends AccountState
case object Overdrawn extends AccountState

There’s pretty much nothing actually going on here, but I’ve defined the basic structures we’ll need for working with our bank account. We can attempt to either Credit or Debit the account, which will result in a credit or debit to the account. If we try to take too much money out of the account it triggers an overdraw, which needs to ask a hypothetical overdraw controller if we’re allowed to do this. Given that we don’t want to immediately allow the overdraw, hand out the money, then come back and decide “oops, probably shouldn’t have done that”, we’ll need some intermediate state between Open & Overdrawn.

To that end I’ve created a simple wrapper state, Guarded[T] which wraps any other state and the initial state we were in. Pretty easy to rollback when you’re carrying the initial state around w/ you, isn’t it! Okay, lets look at some code:

case class Guarded[A <: AccountState, B <: AccountState](s: A, prev: B)

/**
 * A simple case class representing a bank account.
 *
 */
case class BankAccount(balance: Int, state: AccountState)

object BankAccount extends DFA[BankAccoun, TransactionRequest]{

	def transition(in: Input)(state: T): (Result, T)={
 		state.state match{
      case Open => {
      in match {
        case Credit(x) => (Credited(x), BankAccount(balancePostTransaction(state, in), Open))
        case Debit(x) if balancePostTransaction(state, in) >=0 => (Debited(x), BankAccount(balancePostTransaction(state, in), Open))
        case Debit(x) => (Debited(x), BankAccount(state.balance , Guarded(Overdrawn, Open)))
       }
      }
        case Guarded(Overdrawn, prev) => in match {
          // Allow the guarded transition to proceed
          case ApproveOverdraw(amt) => (OverdrawApproved(amt), BankAccount(balancePostTransaction(state, in), Overdrawn)
            // Reject the guarded transition and move back to prior state
            case RejectOverdraw(amt) => (OverdrawRejected(amt), state.copy( state = prev))
        }
        case Overdrawn => {
          case Credit(x) => ???
        } 
  }
  }

  private def balancePostTransaction(acct: BankAccount, tran: TranRequest): Int = {
    acct.balance + tran.amount 
  }


}

I tend to have a bad habit of throwing a lot of code up before speaking to it, and this is no exception. To start with, we have the Guarded[A,B] type described above, followed by a trivial context (state carrier) and the transition table. Most of the transitions are unrelated to what we’re actually concerned with here, but they help to add some ambiance (or I got distracted and wrote them during this flight…).

The really interesting piece and the entire reason for this post is the use of Guarded. You’ll notice halfway through the transition table that attempts to overdraw don’t result in any change to the account balance and they move the state to Guarded. This pattern allows a quick and easy rollback in the event of a failed overdraw, or a successful transition to Overdrawn with the new amount computed.

What does this buy us? I’ll explain.

Why use the Guarded Transition Pattern?

When compared to the alternative of wrapping a possibly failing computation up into a transition, the Guarded pattern has two main advantages. First, it forces a complete separation of concerns between the logic and the DFA model. For instance, a possible implementation of this may involve a parent service that orchestrates actions based on the DFA’s response following a transition. This allows the service to wrap the transition up in a Future, execute it synchronously, or do anything else with it, all unbeknown to the DFA. Additionally if events are being persisted real-time w/ something like Akka Persistence you’ll know exactly what computation was in-progress if the system crashes. Secondly, the guarded transition pattern provides a simple structured error handling & rollback approach for potentially failing transitions. This is especially important if that transition involves a 3rd party service for something like authentication, lookup, etc…

“What’s the point of λ Calculus?” - Matt

I started this blog off with Lambda Calculus for Government Majors which covered the basic concepts & definitions of lambda calculus. But honestly, knowing what λ means doesn’t really get you anywhere. My colleague Matt pointed this out after reading the piece, and he’s right. So what can you do with λ calculus? I mean really, what’s the point of this fancy algebra? Hopefully the following article explains why people get so excited about this stuff.

Refresher

λa.a => λ indicates this is a lambda expression/function β reduction => replace a given λ expression with a common name α conversion => create unique names for each variable to prevent collisions η reduction => strip out unnecessary bound variables

For more info on these just take a peek back at my earlier post.

Addition

Before getting into what you can do with λ calculus, I should probably show an example of how it actually works once it’s been expanded. Expanding a λ expression is essentially replacing bound variables with provided arguments. This can end up throwing a ton of λ and parens at you, but it helps clarify the relationship between a reduced function & any data passed in. To start with, I’ll take advantage of the fact that λ calculus is simple. So simple that it has no concept of numbers, addition, or anything else. This poses a bit of a problem if I’m trying to illustrate using λ for addition. I guess we’ll just start at the beginning and define the natural numbers.

We know that for all natural numbers x >= 0 that x + 1 is also a natural number. This means the notation we’re all use to actually use for, say 5, is equivalent to 0+1+1+1+1+1. I’ll use Church encoding (named after Alonzo Church, the inventor of λ calculus) to encode this property into valid λ functions.

zero = λa.λn.n
	one = λa.λn.a n
	two = λa.λn.a (a n)
	three = λa.λn.a (a (a  n))
	....

What’s really interesting about these numbers and pretty much every other λ expression you’ll see is that it’s a higher order function, or a function that takes another function. Those lambda expressions will grow pretty large pretty quickly, so I’ll just be using normal numbers instead (developers are lazy after all).

Are we ready for addition yet? We know how to create numbers, so what else do we need? It turns out not much:

add = λx.λy.λa.λn.m a (y a x)

So now that I have a whole bunch of λ’s & some parens, what’s actually going on here? Let’s use 1 + 1 as an example:

add( one one) = ((λx.λy.λa.λn.x a (y a n) ) λc.λd.c d ) λ.e.λf.e f
	=> I performed an α conversion on the line above to avoid shadowing on 'one'
	((λc.λd.c d) a (y a n)) λe.λf.e f
	((λc.λd.c d) a ((λ.e.λf.e f) a n))
	c ((λe.λf.e f) a n)
	a (a n)

Who knew something as simple as 1 + 1 would expand into that! Thankfully this is the equivalent of doing binary arithmetic in assembler, i.e. something you won’t need to do again. On the positive side this illustrates the power of function expansion using higher order functions. It takes a while to work through, and I’d strongly recommend working through the same example using 1 + 2 since you already know the answer is λa.λn.a (a (a n)).

What Use Is This?

One word. Compilers. When you step back and think about it, λ Calculus is actually Turing Complete (hopefully that’s more apparent after our foray into Church encoding), so it’s a good fit for trying things out quickly. Anything that can compile down to λ Calculus (any Turing complete language) can then be translated up into another language for execution. So whenever someone says “every language is the same” you can show off your knowledge of λ calculus & isomorphisms between languages!

Hopefully this helped, but if you want to learn more check out these resources:

Wikipedia

Compiling Up to Lambda Calculus