A hectic two weeks saw me doing too much mundane, busy work, but I still found time to start working with RabbitMQ and processin
Learnings
- Problem using OpenStruct with ERB - Ruby 1.9 ERB handles binding variables slightly differently. Compare:
- (1.9)
ERB.new(tmpl).result(vars.instance_eval {binding}) - (1.8)
ERB.new(tmpl).result(vars.send(:binding))
- (1.9)
- marcel/aws-s3 - Excellent ruby library for interacting with Amazon's S3.
- Messaging Patterns by Álvaro Videla: Informative overview of RabbitMQ.
Processing Map Reduce Output by Line
I needed to consume Amazon Elastic Map Reduce (EMR) output and turn it into a format compatible with mysqlimport -- pipe-delimited in my case. What started out as a quick hack, turned into something kind of nice. The key benefit of this script is that it will process your EMR result files by streaming directly from s3, saving you the hassle of copying, processing, and purging! With a decent network connection (~800KB/s) I was able to parse 26, ~7.8MB files in a few minutes.
Note: the script probably requires MRI 1.9.3 compatibility.
Definitions: RabbitMQ
One of the hardest parts of learning a new domain is learning the new language. By writing these up, I hope they will stick in my head longer...
- producer: creates messages
- messages are objects
- consumer: receives and processes messages
- since messages are objects, it's up to you to do the right thing
- queue - holding cell for messages waiting to be processed.
- FIFO
- named, e.g., "test-queue"
- exchange: buffers items before adding them to a queue
- You can have a named exchange, or use a default one specified by an empty string.
- types
- fanout: sends the message to all queues registered w/ the exchange
- direct: sends the message to a named queue via a routing key
- topic: sends the message to named queues via a routing key that can bind to different queues based on their name and wildcards.