Riak Pipes and the Joy of Rebar
So a few days ago riak_pipe was announced and naturally, all I could think is "you know this is great timing". I happened to be evaluating components for a new project which involved processing large numbers of large files. I was originally leaning towards riak/luwak for storage + rabbitmq for messaging + Mojolicious for the c&c interface. I was ging to build a set of perl daemons that processed each request and PUT the resulting binaries back into riak under different buckets.
All that changed three days ago when I got a chance to read through the pipes code base. It solved a few big problems:
- how to distribute work
- how to gather status
- how to notify later stages in the chain
- how to keep data local in the cluster
So yesterday and today I built a functioning prototype. Rebar took care of building the base OTP application and grabbing the 3 modules I needed from github and all of their dependencies. It took about 5 minutes to work through the right order for specifying the applications for the OTP app, bit once I got them sorted, and fixed the retool config to point at the right dirs, I had a deployable skeleton running mochiweb, riak_pipe, and the rest off their deps. That is really a killer feature of rebar which we have Bob Ippolito to blame for. (funny how much of his code is in this stack).
Then there was two functional bits to write:
- a gen_server which managed the work queue interface and managed the pipes.
- a riak_pipe_vnode_worker which created the fitting and handled the processing
I decided to add a create/0 method to the worker as it knows what it is and allows for using a ?MODULE macro in the #fitting_spec. I also chose to bundle the riak_pipe:exec/2 call in the create method as it seemed a good place to do it. In the next iteration, I will probably add a create/2 method to pass in a Sink and Options allowing for chaining creates together. Since my process/3 method could fetch, process, and upload all in one step, I delayed this breakout.
After about 8 hours of reading, playing, and coding I had a working 5 node cluster processing a good number of files. The hashing allowed to keep the workers local to the storage vnodes, and playing with the hash function I could steer work to certain nodes (and compare performance without data locality). I also could easily build a version w/o riak_kv for worker only nodes. Overall it was a joy. Next step is a c&c interface in mochiweb.