Skip to main content

hbase lzo compression on CentOS 6.3

The installation of hbase on CentOS is fairly painless thanks to those generous folks at Cloudera. Add their CDH4 repository and you're there: yum install hbase.

However, adding lzo compression for hbase is a little more tricky. There are a few guides describing how to checkout from github, build the extension, and copy the resulting libraries into the right place, but I want a nice, simple RPM package to deploy.

Enter the hadoop-lzo-packager project on github. Let's try and use this to build an RPM I can use to install lzo support for hbase.

Get the source code:

git clone git://github.com/toddlipcon/hadoop-lzo-packager.git

Install the deps:

yum install lzo-devel ant ant-nodeps gcc-c++ rpmbuild java-devel

Build the RPMs:

cd hadoop-lzo-packager
export JAVA_HOME=/usr/lib/jvm/java
./run.sh --no-debs

Et voila – cloudera-hadoop-lzo RPMS ready for installation. But wait… The libs get installed to /usr/lib/hadoop-0.20… That's no good, I want them in /usr/lib/hbase.

So I went ahead & hacked run.sh and template.spec to allow the install dir on the target machine to be specified on the command-line. I can now use a command line something like this:

./run.sh --name hbase-lzo --install-dir /usr/lib/hbase --no-deb

That produces a set of RPMs (binary, source, and debuginfo) with the base name hbase-lzo and libraries installed to /usr/lib/hbase

My changes (plus another small change adding necessary BuildRequires to the RPM spec template) are in my fork of the project on github

Comments

Popular posts from this blog

Python logging with rich - writing to stderr - plain output when writing to file

Rich is a Python library for writing rich text (with color and style) to the terminal, and for displaying advanced content such as tables, markdown, and syntax highlighted code. Rich provides RichHandler , a logging handler for python's logging module which will format and colorize text written by the module. However, RichHandler writes to stdout by default. More specifically, it writes to a rich Console object which, by default, writes to stdout. To make RichHandler write to stderr by default, you must pass in a Console object which has been configured to write to stderr: import logging from rich.console import Console from rich.logging import RichHandler DATEFMT = "%Y-%m- %d T%H:%M:%SZ" FORMAT = " %(message)s " logging . basicConfig( level = "NOTSET" , format = FORMAT, datefmt = DATEFMT, handlers = [RichHandler(console = Console(stderr = True ))], ) logger = logging . getLogger(__name__) logger . i...

Fix python import order on save in vim with ruff and ale

My IDE of choice is vim. I use various tools to perform linting and code formatting, and configure them all with ALE  (the Asynchronous Lint Engine). After using several discrete tools ( black , isort , flake8 , etc) I have settled on using Ruff to do my python code formatting and linting. Here's the relevant fragment of my ALE config in my .vimrc: " ALE config let g :ale_fixers = { \ 'python' : [ 'ruff' , 'ruff_format' ], \} let g :ale_linters = { \ 'python' : [ 'ruff' ], \} let g :ale_python_ruff_use_global = 1 One of the last remaining wrinkles I had was getting Ruff to automatically sort import statements. Sorting imports is performed by the Ruff linter, not the formatter, which is documented here . The fix on the command line is to add an option, like this: ruff check --select I --fix The difficulty I had was getting this to happen in the editor when the file was saved. It turns out, all I needed to do was ...

Escaping special characters in wget username or password

I recently offered to help out with the hosting of a WordPress  site. It’s currently hosted somewhere with no shell access – just ftp – and there are a lot of images to transfer. I quickly figured out I could use wget to mirror the site, using something like: wget -m ftp://username:password@example.com However, this broke in this case because the username for the site contained an @ character (the username was user@example.com ). Turns out the solution was to encode the special chars using HTML notation. This is the command that did the trick: wget -m ftp://user%40example.com:password@example.com