Kevin McFadden

The writings of an imperfect perfectionist

Mar 26, 2012
A hectic two weeks saw me doing too much mundane, busy work, but I still found time to start working with RabbitMQ and processin

Learnings

Processing Map Reduce Output by Line

I needed to consume Amazon Elastic Map Reduce (EMR) output and turn it into a format compatible with mysqlimport -- pipe-delimited in my case. What started out as a quick hack, turned into something kind of nice. The key benefit of this script is that it will process your EMR result files by streaming directly from s3, saving you the hassle of copying, processing, and purging! With a decent network connection (~800KB/s) I was able to parse 26, ~7.8MB files in a few minutes.

Note: the script probably requires MRI 1.9.3 compatibility.

Definitions: RabbitMQ

One of the hardest parts of learning a new domain is learning the new language. By writing these up, I hope they will stick in my head longer...

  • producer: creates messages
    • messages are objects
  • consumer: receives and processes messages
    • since messages are objects, it's up to you to do the right thing
  • queue - holding cell for messages waiting to be processed.
    • FIFO
    • named, e.g., "test-queue"
  • exchange: buffers items before adding them to a queue
    • You can have a named exchange, or use a default one specified by an empty string.
    • types
      • fanout: sends the message to all queues registered w/ the exchange
      • direct: sends the message to a named queue via a routing key
      • topic: sends the message to named queues via a routing key that can bind to different queues based on their name and wildcards.
Mar 19, 2012
This week focuses on Hadoop and AWS Elastic Map Reduce, plus how to install the same version of Hadoop on OS X with Homebrew

Things I should remember by now

  • Sorting by a column in Unix AND specifying a tab as the column separator:

    cat /tmp/file | sort -t"`echo '\t'`" -k2n
    
  • Using awk to sum a column of numbers - Again, with the tab separator.

    awk -F"`echo '\t'`" '{ sum += $2 } END { print sum }'
    

Installing Hadoop on OS X and Homebrew

If you are using OS X and Homebrew, Hadoop can be installed with a simple:

brew install hadoop

However, if you want to use a version compatible with AWS, specifically 0.20.205.0, you need to hack the brew formula.

Before:

url 'http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'
md5 'e627d9b688c4de03cba8313bd0bba148'

After:

url 'http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-0.20.205.0/hadoop-0.20.205.0.tar.gz'
md5 '8016D8A2A50CB2BEB17F2F45A1EA28DA'

Last I checked, this was the only way to do it w/o forking the project. Before running brew update you should remember to cd /usr/local/ && git stash. Afterwards, git stash pop to re-apply them.

Note: If your map or reduce methods catch exceptions, make sure they don't hide problems. You may end up with a successful run, but output is empty.

Mar 12, 2012
I didn't need to learn a whole lot last week, but two useful items are MySQL's group_concat function and how to change the admin
  • Change Mac admin password without the disk Very useful when employees leave and their password doesn't appear to work.

  • MySQL's Group Concat Function I love this function! If you ever need to pull back a list of anything, e.g. table ids, this will put them all into one column, separated by a comma or whatever you specify.

    SELECT GROUP_CONCAT(id) FROM authors;
    
Mar 5, 2012
An annotated collection of links I found useful from week 9 of 2012.

This is the inaugural post. I hope to capture all of the truly useful links I referenced for work and play in the previous week. This week will be a little short since I'm starting it on a Sunday…

The most useful article for me was Jay Field's Alternatives for Redefining Methods. More about why it was so useful once my code clears a third party sanity check!

Dec 31, 2011

Thank you for using scriptogr.am. While we’re still in early beta development, we think you’ll enjoy the app. It’s designed to be fast, simple and to get the most creativity out of you.

scriptogr.am uses Markdown, a lightweight markup language, originally created by John Gruber and Aaron Swartz. Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML). See the Syntax page for details pertaining to Markdown’s formatting syntax. You can try it out, right now, using the online Dingus.

Getting started

After connecting your Dropbox account to scriptogr.am, some necessary files and folders are added to your Dropbox at Apps/scriptogram. First the GET_STARTED.txt text file that pretty much explains the exact same as what you’re reading now. Next, we’ve added a posts folder. This is where you add your blog post (& page) files. These files are plain textfiles, but needs to be saved with the .md (markdown) extension like this: yourfile.md

We’ve added a post example page (this file) there for you to get familiar with.

The template data

All files needs to contain "front block". The front block must be the first thing in the file and takes the form of:

---
Date: 2012-04-17
Title: My first post
---

Between the triple-dashed lines, you can set any of the predefined variables (see below for a reference). But, the Title is required. Without the title, the system will fail.

Predefined global variables

All the variable names below are case-sensitive:

Required:

Title

The title of your post (or page)

Not required, but close to:

Date

The following date format is the correct one to use: 2001-12-18. (The Date variable can be used to ensure correct sorting of posts.)

Optional:

Published

Set to ’false’ if you don’t want a post to show up when the site is generated.

Type

Set to ’page’ if you wan’t the post to act as a ’page’ instead of a ’post’.

Excerpt

Add an excerpt1 to your post or page.

Difference between ’posts’ and ’pages’

A post is a blog post.

A page is a similar as a post, but generates a link visible in the menu on your site that will lean to a page permalink.

Publishing your posts

This is simple. Just head to your admin panel and hit the ”Synchronize” button. When logged in to scriptogr.am and visiting your own page, you’ll see the scriptogr.am logotype symbol on the top right of the browser window. This is the link that leads to your admin panel.

Published vs unpublished

Total count of published and unpublished posts (& pages) are visible next to the ”Synchronize” button. ”Unpublished” means that you either removed a post text file from your Dropbox or that something went wrong while trying to sync your Dropbox with scriptogr.am. Also, if you’ve set a post to be published with the Published: false variable.

Finally, happy posting. If you have any questions, suggestions or thoughts just drop us an e-mail at any time.


  1. An excerpt is a relatively small sample passage from a longer work, such as a book or article. 

Apr 17, 2011
When using Amazon's Elastic Load Balancer, and probably any load balancer, you lose normal access to the requestor's IP address.

UPDATE: Minor rewrite for the new blog hosting. Should have no incorrect info.

When using Amazon's Elastic Load Balancer, and probably any load balancer, you lose normal access to the requestor's IP address. ELB appends the missing IP address to the X-FORWARDED-FOR header, so if your application uses this information you will need to use this variable. Header values are spoofable, but the solution is fairly simple. This solution is presented for Apache HTTPD.

X-FORWARDED-FOR is an HTTP header field for recording the originating IP address as a browser request passes through HTTP proxy and load balancer servers.

After enabled mod_headers (this may no longer be necessary w/ SetEnvIf), add the following line to your site's Apache configuration:

SetEnvIf X-FORWARDED-FOR (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*$ UpstreamIpAddress=$1

This will set the UpstreamIpAddress variable with the last IP address in the list, which will always be the one accessing the load balancer. Obviously, any proxy servers, firewalls, or other servers that change the IP address before reaching the load balancer would will conceal the true client IP address.

If the IP address is important to you, you'll also want to update your access logger format:

LogFormat "%{UpstreamIpAddress}e %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" combined_using_upstream_ip_address

If you need this value in your application, you can look for it in the request environment variables:

Ruby on Rails: request.env
PHP: $_SERVER
Jun 17, 2010
In which I add my opinion of marketing to Zed and Giles.

UPDATED: While porting this article, it took me a while to remember why I wrote this. In order to remember the inspiration, I've fixed the reference links.

In a world of Whys and Zeds, I'm sure most people would choose Why. Both are probably certifiable, but one is gruffly entertaining and the other is obliquely clever. I've never met either of them, but I think I'd like to work with Zed. He'll tell it like it is and require your A game. He's a breath of fresh air in a world of posers (which isn't saying Why was a poser, but there are more self promoters than actual rock stars.)

Anyone who thinks marketing isn't lying, isn't living in the real world. Marketing is about selling something to people who don't know they want it. Most products are crap -- it doesn't matter what the market is. Marketers need to use creative words and imagery to distinguish their product from the others floating in the cesspool. Have you ever notice that every product or company is a "market leader"? That's not even creative. Do you really think Bud is noticeably better than Miller? Neither are even contenders, except in the marketing world.

Marketing truth would be awesome, if only it worked! If you are totally truthful, you'll be admitting your faults, which is equivalent to marketing your competitors because they won't be telling their whole story. How many politicians tell the whole truth? How about CEOs of global corporations sitting before Congress? If you are only telling half the story, that's half way to lying, and I don't think most marketing comes close to speaking sooth. Caveat Emptor.