Building Database Driven Web Applications With PgProc
For years, I've been building database interfaces for a wide range of projects. All of these projects have a very similar set of requirements:
- The database must scale easily because the volume of data will be ever increasing, but you can't ever reimport because it would be astronomically expensive
- The application code must scale, but you can't trust that all of the programmers using the database will be experts
- The project has to have been done yesterday, and has to go into production before it is going to be ready, but the company depends on it
No one would ever write those requirements down, because it would put a pack of lawyers' kids through a good school. But they're the requirements we have in reality none the less.
To address these requirements, all of my database projects have adopted a set of methodologies which prove to be more robust in the face of humans:
- Only the most senior systems engineers shall touch the database, and all shall test their theories empirically before putting into production
- All of the senior/junior devs who are the clients of the database shall only invoke the stored procedures defined by the senior systems engineers
- Any change to the database must be done via scripts that reside in source control, and be versioned (no hand jobs ever)
Basically, you treat the database and its data as what it is, a mission critical application, that needs regular care and feeding by skilled engineers, like a steam engine racing down the track. Failure to care for and monitor it will result in it exploding. And since the system engineers are usually performing open heart surgery on a ticking timebomb racing down the track at 100mph, they need to be as protected as possible from tight coupling with other systems. If one team writes a quick hack with some arbitrary SQL that joins across tables in unexpected ways, you have little to no way of untangling that ball of barbed wire once it's been delivered to the client (see requirement 3).
To avoid having users write any SQL at all, I've build a series of modules for various dynamic languages to generate closures on the fly against the database's catalog tables. Using system wide reflection, we can query any moderately intelligent database for a list of the stored procedures stored in the DB. We can then use that list to generate a series of per-language functions which invoke those functions as if they were native functions. Since many databases can also provide type information, we can do this even for static languages, but I've found that the benefits of using static languages for application level code are fleeting. Poorly defined business requirements, changing customer goals, and an evolving market place means that your code is going to change. Trying to fight change is death. By allowing the database programmers to change the underlying representations and arrangements, but keeping a consistent high level consumer facing API through stored procedures, we get to change much more by keeping our interface clean.
For example, let's say you have a user_by_id(_id uuid) function, that will return a user record as json in your app database's public schema. You could use pgproc to expose it on port 5555 by typing:
pgproc postgres://localhost:5432/app public 5555
You can then query the web interface:
No need to write anything but your table, and the stored procedure for accessing that table's data. I might support generalized CRUD, but that requires translating between HTTP method and the appropriate stored procedure name. Over time, I've found that model tends to not be sufficient or even interesting. Rather, I have begun to view the list of stored procedures in a schema much like a Object Oriented programming concept of class. Each schema defines a class, and each stored procedure is a method that operates on that class. The underlying tables that represent the data associated with those classes are the memory allocations for the instance objects. Using the same concepts throughout, you can model you application using a variety of techniques, but maintain a consistent understanding.
One of the things that I've also found useful is to automatically generate a machine readable description of the API. In the case of the most recent version of pgproc.js, it auto generates a help() function which returns back a json structure which describes each of the API calls you can make. The representation for each call is a S-expression in JSON where in the name of the stored procedure comes first, followed by each of it's argument names. A well designed API uses meaningful names for both the stored procedure and the arguments themselves. And while this function is no substitute for proper documentation, it does make it possible to programmatically discover the interface.
Another nice property of the help() function is when running, pgproc.http.js, you automatically get a /help URL, which describes in JSON each of the available API calls. This means a client can discover the capabilities of the server with minimal prior knowledge. If the API changes for some reason, the help API provides a way to communicate this change to the client, and also provides a way for integration tests to verify the signature of the running procedures. It is a very good idea to version each of your the APIs. Providing a version() function, you can automatically return a version number that will allow the client to identify how to interact with the API, should the return values change.
Building systems that interoperate smoothly is all about building well defined interfaces. Ensuring that interfaces remain consistent across languages and across protocols is a major source of friction. Entire industries are based upon artificial friction. Reducing your friction will only improve your velocity and ability to change.