Nov 25


The following are my notes on how to setup a freshly installed Centos 6.5 machine to compile the Hadoop native library. My reason for doing this was to add a modified version of the native library which utilized hardware compression. In order to accomplish that, I needed to recompile the native library. In a previous post, “Integrating Hardware Accelerated Gzip Compression into Hadoop”, I described the process for modifying the Hadoop native library to use a hardware compression accelerator.

In that previous post I utilized Hadoop 2.2. In this post I’m using Hadoop 2.4 because that’s the version of Hadoop that matches what is utilized by Ambari 1.6.0, the management tool I’ve chosen to administer my current cluster. Additionally, I was using Ubuntu 12.04 in the previous post, whereas in this post I’m using Centos 6.5. Ubuntu is typically my default Linux flavor, but because it is not compatible with Ambari, I installed Centos 6.5.

All of these steps assume you are logged in as “root”. If you aren’t, prefix the commands below with “sudo”.

1. Install the development tools required to build the native library via the package manager. (Note: Some of these may already be installed)

# yum update
# yum groupinstall “Development Tools”
# yum install zlib-devel
# yum install zlib
# yum install xz
# yum install xz-devel
# yum install cmake
# yum install openssl
# yum install openssl-devel
# yum install openssh
# yum install autoconf


2. Optionally install these packages. (Note: These steps aren’t required to build Hadoop’s native library, I just find myself having to install them eventually)

2.1 Install the kernel development package, vi, and git using the package manager
# yum install kernel-devel
# yum install vi
# yum install git

2.2 Install hexedit manually
# wget
# rpm -Uvh hexedit-1.2.10-1.el6.rf.x86_64.rpm

2.3 If you have an AHA Hardware Compression Accelerator installed, install/load the driver, and build the hardware accelerated zlib variant. See Steps 1-2 of this post.

3. Install Oracle Java JDK 1.7

Download Oracle Java JDK 7.1. The file should be named something similar to “jdk-7u71-linux-x64.tar.gz”
# mv jdk-7u71-linux-x64.tar.gz /opt/
# tar -xvf jdk-7u71-linux-x64.tar.gz
# echo -e "export JAVA_HOME=/opt/jdk1.7.0_71\nexport PATH=\${JAVA_HOME}/bin:\${PATH}" > /etc/profile.d/


4. Install Maven 3.0.5

# cd /usr/local/
# wget
# tar -xvf apache-maven-3.0.5-bin.tar.gz
# ln -s apache-maven-3.0.5 maven
# echo -e “export M2_HOME=/usr/local/maven\nexport PATH=\${M2_HOME}/bin:\${PATH}” > /etc/profile.d/


5. Install Ant 1.9.4

# cd /usr/local/
# wget
# tar -xvf apache-ant-1.9.4.tar.gz
# ln -s /usr/local/apache-ant-1.9.4 ant
# echo -e “export ANT_HOME=/usr/local/ant\nexport PATH=\${ANT_HOME}/bin:\${PATH}” > /etc/profile.d/


6. Install Protobuf 2.5.0

# wget
# tar -xvf protobuf-2.5.0.tar.gz
# cd protobuf-2.5.0
# ./configure
# make install


7. Download Hadoop 2.4.0 source

# mkdir /usr/local/hadoop_src
# cd /usr/local/hadoop_src
# wget --no-check-certificate
# tar -xvf hadoop-2.4.0-src.tar.gz


8. Compile the Hadoop native library (Note: Make any changes to the native library, such as adding support for hardware compression, before this step)

# cd /usr/local/hadoop_src/hadoop-2.4.0-src
# mvn package -Pdist,native -DskipTests -Dtar


9. Backup Hadoop’s native library, and then replace it with the version compiled in step 8 (Note: The path to the hadoop libraries below are the default location that Ambari installs to. Your installation path will vary depending on how you went about your Hadoop installation)

# cd /usr/local/hadoop_src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/lib/
# tar -cvf native_NEW.tar.gz native
# cp native_NEW.tar.gz /usr/lib/hadoop/lib/
# cd /usr/lib/hadoop/lib/
# tar -cvf native_ORIGINAL.tar.gz
# rm -rf native
# tar -xvf native_NEW.tar.gz

That’s it!

Tagged with:
Oct 13

Hadoop AHA HW GZIP Compression


The following is a set of instruction for integrating HW GZIP compression into a Hadoop DataNode. The compression enabled in this set of instructions is activated on the map phase’s output data, as well as the reduce phase’s output. Compression at the map output’s stage reduces the number of the bytes written to disk and then transferred across to network during the shuffle and sort stage of a job. Compression at the end of the reduce phase reduces the size of the data on the HDFS disk, as well as the time spent writing to the disk.

Sounds great, why wouldn’t one utilize GZIP compression? Why isn’t it enabled out of the box?

Well, the problem with GZIP compression is that it creates a high load on processors. As a result, it’s bypassed in favor of other “lighter” compression algorithms. The trade off is that these other algorithms are plagued by lower resultant compression ratios. (see : GZIP or Snappy )

GZIP HW Compression provides the best of both worlds; the CPU load needed to compress data is offloaded to a dedicated piece of hardware, and the compression ratio remains high.

So, without further ado, let’s integrate an AHA HW Compression Accelerator into our Hadoop node.

Note: This procedure was performed using Hadoop 2.2 (single node cluster) on Ubuntu 12.04. The HW Compression utilized for this setup was an AHA372. Also, for the benefit of full disclosure, I’m a software engineer at AHA Products Group (which makes for easy access to GZIP accelerator cards to experiment with).


1. Install and load the AHA 3xx hardware compression driver.

1.1 After installing the AHA 3xx PCIe compression accelerator into an available PCIe slot, download and unpack the AHA3xx driver to a location of your choosing. I’ve created a folder at /usr/local/aha and placed the tarball in that directory

$ mkdir -p /usr/local/aha
$ cp  AHA3xx_ver_3_1_0_20140711.tgz /usr/local/aha/
$ tar -xvf AHA3xx_ver_3_1_0_20140711.tgz
$ cd AHA3xx_ver_3_1_0_20140711

1.2 Compile the driver

$ sudo ./install_driver


2. Compile and install the AHA zlib library

2.1 Compile the AHA zlib library

$ mkdir /usr/local/aha/zlib
$ cd /usr/local/aha/AHA3xx_ver_3_1_0_20140711/zlib
$ ./configure -shared -prefix=/usr/local/aha/zlib --hwdeflate
$ make install

2.2 Add the AHA zlib library’s path to the system’s library search path.

$ sudo echo "/usr/local/aha/zlib/lib/" > /etc/

2.3 Verify that the AHA zlib library’s path is visible

$ ldconfig
$ ldconfig -p | grep aha (libc6,x86-64) => /usr/local/aha/zlib/lib/ (libc6,x86-64) => /usr/local/aha/zlib/lib/


3. Download and unpack the Hadoop source code from one of the Apache download mirrors to a location of your choosing. In this example, the source code was placed at “/home/hadoop/src/”


4. Install the tools required to build the Hadoop native library (if they aren’t already installed).

4.1. Install the tools that are available via the package manager.

$ sudo apt-get install build-essential
$ sudo apt-get install autoconf automake cmake g++
$ sudo apt-get install libtool zlib1g-dev pkg-config libssl-dev
$ sudo apt-get install maven

4.2. Manually install Protobuf 2.5 which is not available via the package manager.

$ wget
$ tar -xvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./configure --prefix=/usr
$ sudo make install


5. Compile and load the Hadoop native library to verify that if functions correctly without AHA zlib modifications.

5.1 Compile the Hadoop native library. Note: Compilation takes several minutes. On my machine, it took about seven minutes.

$ cd /home/hadoop/src/hadop-2.2.0-src/
$ mvn package -Pdist,native -DskipTests -Dtar

5.2 Copy the native library from the source over to your Hadoop installation. (be sure to backup the existing native library)

$ cp /home/hadoop/src/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native/* YOUR_HADOOP_LOCATION/lib/native/


6. Edit the Hadoop configuration file in YOUR_HADOOP_LOCATION/etc/mapred-site.xml to enable map output, and reduce output compression. Add the following lines before the “</configuration>” line.







7. Run a Hadoop job that has a reduce stage and verify compression is working for both the map output as well as the reduce output.

7.1 “Wordcount” is a Hadoop routine that produces intermediate map output data, and thus can be used to test compression. To set this up, first, copy text files from your local file system to the HDFS file system.

$ wget
$ wget
$ wget
$ bin/hadoop dfs -mkdir /books
$ bin/hadoop dfs -put 20417.txt.utf-8 /books/
$ bin/hadoop dfs -put 5000.txt.utf-8 /books/
$ bin/hadoop dfs -put 4300.txt.utf-8 /books/

7.2 Run wordcount on the dataset. In the statistics that come back from the job, note the number with the prefix “Map Output Materialized Bytes:”

$ YOUR_HADOOP_LOCATION/bin/hadoop jar YOUR_HADOOP_LOCATION/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /books /booksoutput

Map Output Materialized Bytes: 479752

7.3 Comment out the properties added in step 6, to disable compression. Rerun the wordcount job. Verify that the “Map Output Materialized Bytes:” value is larger than in the previous run. This verifies that compression is being utilized on the map-output data.

$ bin/hdfs dfs -rmr /booksoutput
$ YOUR_HADOOP_LOCATION/bin/hadoop jar YOUR_HADOOP_LOCATION/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /books /booksoutput

Map Output Materialized Bytes: 1459156

7.4 To verify that compression is being utilized on the reduce-output data, examine the destination directory in HDFS. The resultant data ‘parts’ should be appended with a gz

bin/hdfs dfs -ls /booksoutput

-rw-r–r– 1 hadoop supergroup 0 2014-10-16 10:08 /bookouput/_SUCCESS
-rw-r–r– 1 hadoop supergroup 305714 2014-10-16 10:08 /bookouput/part-r-00000.gz

7.5 Turn compression back on by restoring the properties in step 6.


8. Modify the native library CMake configuration and source code to point the compression library to the AHA HW compression.

8.1 Modify the Hadoop native library’s CMake file to point to AHA HW compression’s version of zlib.

$vim hadoop-common-project/hadoop-common/target/native/CMakeCache.txt




8.2 Modify the native library’s zlib library macro.

$ vim hadoop-common-project/hadoop-common/target/native/config.h



8.3 Edit the native library’s decompression c source such that symbols from AHA zlib library are loaded instead of those from the standard zlib library.

$ vim hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibDecompressor.c

LOAD_DYNAMIC_SYMBOL(dlsym_inflateInit2_, env, libz, “inflateInit2_”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflate, env, libz, “inflate”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateSetDictionary, env, libz, “inflateSetDictionary”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateReset, env, libz, “inflateReset”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateEnd, env, libz, “inflateEnd”);

LOAD_DYNAMIC_SYMBOL(dlsym_inflateInit2_, env, libz, “AHA_inflateInit2_”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflate, env, libz, “AHA_inflate”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateSetDictionary, env, libz, “AHA_inflateSetDictionary”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateReset, env, libz, “AHA_inflateReset”);
LOAD_DYNAMIC_SYMBOL(dlsym_inflateEnd, env, libz, “AHA_inflateEnd”);


8.4 Edit the native library’s compression c source such that symbols from AHA zlib library are loaded instead of those from the standard zlib library.
$ vim hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c

LOAD_DYNAMIC_SYMBOL(dlsym_deflateInit2_, env, libz, “deflateInit2_”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflate, env, libz, “deflate”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateSetDictionary, env, libz, “deflateSetDictionary”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateReset, env, libz, “deflateReset”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateEnd, env, libz, “deflateEnd”);

LOAD_DYNAMIC_SYMBOL(dlsym_deflateInit2_, env, libz, “AHA_deflateInit2_”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflate, env, libz, “AHA_deflate”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateSetDictionary, env, libz, “AHA_deflateSetDictionary”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateReset, env, libz, “AHA_deflateReset”);
LOAD_DYNAMIC_SYMBOL(dlsym_deflateEnd, env, libz, “AHA_deflateEnd”);


9. Compile and install the newly modified Hadoop native library.

9.1 Compile the Hadoop native library. Note: As before, compilation takes several minutes. On my machine, it took about seven minutes.

$ cd /home/hadoop/src/hadop-2.2.0-src/
$ mvn package -Pdist,native -DskipTests -Dtar

9.2 Copy the native library from the source over to your Hadoop installation.
$ cp /home/hadoop/src/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/lib/native/* YOUR_HADOOP_LOCATION/lib/native/


10. Rerun the wordcount job from step 7 (making sure the compression properties are enabled in mapred-site.xml) to verify everything is working as expected. Your Hadoop node is now using HW compression to deflate intermediate map output data.

Tagged with:
Sep 07


For me, the process of learning something new from a programming book are as follows:

1. Buy two or more books that cover the same material.
Having two perspectives on the same topic fills in holes that one author left because he/she is so familiar with the topic that he/she assumes the missing information is common knowledge. Additionally, different authors tend to focus on different aspects of a topic; combing these different attentions, in my estimation, makes for a more rounded experience.

I was going to write that I don’t do this for advance topics, but as I look at my bookcase (yeah, most of my books are still paper), that isn’t true. The only reason I don’t have more than one book on a topic I’m interested in, is that a second doesn’t exist.

2. Have a small project in progress as you make your way through the book.
With some code already started (or at least planned out), the material is relevant to a task I’m trying to accomplish. For me this makes some of the material easier to absorb as I’m paying attention out of necessity. Without a project in mind, in the best case, things that seem useful are placed in the region of my brain where they are abbreviated so they can be looked up again. In the worst case, they are completely forgotten.

May 25


I’ve been quietly watching a fews companies’ reputations on Glassdoor fall for about a year now. It’s a massacre. That worst part is that most of the blood drawn originates from these companies repeatedly shooting themselves in the foot, over and over.

For companies there is a silver lining to a stretch of bad reviews; it tells you what your employees’ gripes are without a filter. This is about as close as you’re going to get to reading their minds. Obviously an organization doesn’t want to broadcast its disfunction to potential candidates, but there is some good to be made out of it.

With this in mind, you can make it a whole lot worse. Here’s how…

The absolute worst thing you can do is to start writing fake positive reviews; it WILL backfire.  Beyond not being ethical, to anyone with half a brain these stick out like a red thumb. Yes, even the fake lukewarm reviews stick out. Ultimately, fake reviews insult the intelligence of the candidates that may come into the company.  They also make the company look like they are hiding something rather than patching it up. To the content employees these will be a source of embarrassment. To those disgruntled employees or ex-employees on the sidelines, you’re inviting them to write negative reviews. This is a hydra with which you’re better off walking away from. This is what I call “shooting yourself in the foot”.

The second foot gets shot off when the positive reviews start criticizing the negative reviews. Don’t do this. This is a flag to everyone observing that, if it’s not already a fake review (which it likely is), it’s written by a company tool.  In the best circumstances a prospective employee would likely just ignore these.

The third foot (if there was such a thing) gets shot off when someone at the company responds to the negative reviews without addressing the issue.  The only response that is warranted is something like “We strive to foster an exciting, inclusive, dynamic team environment, but we have apparently fallen short. We apologize, and will take your comments into consideration so that we may improve in the future. Please give me a call so that we can further discuss this with you”.  If the negative review rails on about mistreatment but has a minor compliment about office decor, don’t thank them for the compliment without addressing the larger issue.  One might think this is something that doesn’t need to be said, but I’ve actually seen this occur. I can only guess that the person writing on behalf of the company believed they could perform some sort of Jedi mind trick causing the negative review to be erased from the mind of the reader.

What it comes down to is many might consider interviewing with a company with a stretch of bad reviews, but would likely ask about those bad reviews at the interview. A couple of bad reviews isn’t going to kill your reputation. Readers know that anonymity emboldens the fringes of society. With that said, most (myself included) wouldn’t touch a company with a swath of fake reviews with a ten foot pole.

Like in all things, being honest and respectful of the people around you seems like a better path. You might as well start down that path with your employees before they’ve even shown up for their first day of work. I’ll bet you can argue yourself out of taking that path, but it’s ill fated if you want to convince people to spend any time out of their career with your organization.


May 23


1. Visit Centos’ web page,, and download the iso image you’d like to boot from.
2. When the download has completed, open up terminal and use ‘hditutil’ to convert the *.iso to an *.img file (specifically, a UDIF read/write image).

$hdiutil convert -format UDRW -o target.img CentOS-7.0-1406-x86_64-Everything.iso
Reading Master Boot Record (MBR : 0)…
Reading CentOS 7 x86_64 (Apple_ISO : 1)…
Reading (Type EF : 2)…
Reading CentOS 7 x86_64 (Apple_ISO : 3)…
Elapsed Time: 33.590s
Speed: 200.5Mbytes/sec
Savings: 0.0%
created: /tmp/target.img.dmg

3. Use the ‘dd’ utility to copy the iso to your USB drive:

$ diskutil list
0: GUID_partition_scheme *121.3 GB disk0
1: EFI EFI 209.7 MB disk0s1
2: Apple_HFS Macintosh HD 120.5 GB disk0s2
3: Apple_Boot Recovery HD 650.0 MB disk0s3
0: FDisk_partition_scheme *31.9 GB disk1
1: DOS_FAT_32 NO NAME 31.9 GB disk1s1
0: CentOS_7.0_Final *4.5 GB disk2
$ diskutil unmountDisk /dev/disk1
Unmount of all volumes on disk1 was successful
$ diskutil unmountDisk /dev/disk2
Unmount of all volumes on disk2 was successful
$ time sudo dd if=target.img.dmg of=/dev/disk1 bs=1m
4261+0 records in
4261+0 records out
4467982336 bytes transferred in 1215.483272 secs (3675890 bytes/sec)

4. You should be done! Boot from the USB drive on your target machine.

Tagged with:
May 09


Below is my Objective-C niggle list. Beyond listing my niggles, I’ve also included what I think is a proper resolution for each complaint and why I think it is a correct resolution. I’d love to hear from anybody who disagrees and why they disagree.

I should note that I’ve, at some point in the past, committed some (probably all) of the Objective-C coding sins I’ve called out below. whattayagunnado…



Instead of this:


Do this:

static const NSInteger kFrameMarginTop = 50;
static const NSInteger kFrameMarginBottom = 50;

Here’s Why:
The macro doesn’t explicitly indicate any type information.




Instead of this:

NSNumber *myNumber = [NSNumber numberWithInteger:42];
NSArray *myHeroes = [NSArray arrayWithObjects:@"Superman", @"Spiderman", nil];
NSMutableArray *myOtherHeroes = [NSMutableArray arrayWithObjects:@“Black Panther”, @"Moon Knight", nil];
NSDictionary *myContacts = [NSDictionary dictionaryWithObjectsAndKeys:@"John", @"FirstName", @"Doe", @"LastName", nil];

Do this:

NSNumber *myNumber = @42;
NSArray *myHeroes = @[@“Superman", @"Spiderman"];
NSMutableArray *myOtherHeroes2 = [@[@"Aquaman", @"Moon Knight"] mutableCopy];
NSDictionary *myContacts = @{@"FirstName":@"John", @"LastName":@"Doe"};

Here’s Why:
The literal syntax is easier on the eyes. Not using the literal syntax signals that you’ve stopped paying attention to the advances in clang before 2012.




Instead of this:

enum BroadcastState{
typedef enum BroadcastState BroadcastState;

Do this:

typedef NS_ENUM(NSUInteger, BroadcastState) {

Here’s Why:
The NS_ENUM macro’s enumeration type is explicit.



No ops

Instead of this:

if( someVar != nil )
  [someVar doSomething];

Do this:

[someVar doSomething];

Here’s Why:
Sending a message to a nil object is a no-op.



Tagged with:
Sep 22

It turns out that getting rid of those wacky new constraints you see attached to user interface elements in Interface Builder is easy. To remove auto layout constraints in Xcode 4.5, just uncheck the “Use Auto Layout” box from the File Inspector while in the Interface Builder layout.  Honestly, if you’re building for iOS 6 and beyond you’re probably going to have to get used to them.  To me, this heralds the end of the days where we only had a couple of aspect ratios to design for.

Tagged with:
Sep 01
Next Year Will be the Year of the Linux Desktop

Always next year...

Tagged with:
Aug 18

iOS SDK Tutorial: Simple Multithreaded Programming with Grand Central Dispatch
This tutorial demonstrates how to one might begin to utilize Grand Central Dispatch in order to write a multithreaded application in iOS for the iPhone/iPod Touch/iPad using Objective-C with Xcode.

Tagged with:
May 25

Mary Millbee's Memory Match - New Card Set!

The newest version of Mary Millbee’s Memory Match is live! This update adds a new set of number & letter cards.

Tagged with:
preload preload preload