Wednesday, February 22, 2012

Book Review for MEAP "Hadoop in Practice"


Big Data is a hot topic, and a fast moving one as well. In that workspace, Hadoop is a big player. This early access edition shows the book's sweet spot: the areas other books have missed.

Hadoop, mature enough to have been recognized in mainstream media, is still fast moving. Like any framework, it constrains it's users to fairly rigid usage patterns-- but the users are finding ways around these. This book introduces you to some of these, opening up uses of Hadoop that would otherwise be out of bounds for you.

The book is a MEAP edition, which means it contains less than the full content. In this case, it means the book contains just a few chapters, good for 176 pages. The table of contents promises about a dozen more chapters and a few more appendices, so there is potential this could be a big book when it's done. But time will tell how those final chapters take shape, the ones that are present are rich enough that a little consolidation wouldn't be surprising.

The first chapter introduces the basics of Hadoop, and includes some excellent diagrams. Pictures can often bring clarity that words don't, and I really like to see plain and simple pictures to help me grasp the big picture. This book does well in this regard. Besides the overview, we get a quick glimpse of related and complimentary technologies, restrictions of using Hadoop and alternatives to using Hadoop. Versions of Hadoop are covered in two dimensions: the various distributions, and what's contained in forward-looking iterations. The chapter wraps up with a brief section on installing and configuring Hadoop for a first run.

The next chapter we're given is chapter 3, it covers data serialization tools and techniques. More good pictures are found in this chapter, as are explanations of how you can use Hadoop to process XML, JSON, Google's ProtocolBuffers, and Facebook's Thrift. Each of these gets their own section, explaining how you might use them. There are plenty of references to Elephant Bird, an open source project maintained by Twitter. You also learn how to handle custom file formats if you need them.

The final chapter of this early access book is on HDFS tuning techniques. The author tells you why Hadoop is not well suited to processing loads of small files, and how you can get around this limitation. The chapter also covers choosing the best compression and codec for your particular needs. When working with large amounts of data, choosing the right tools for compression can make huge differences in performance, so the contents of this chapter should be of high interest to those who are heading for production environments.

So, what's the verdict? I found the book's contents to be of high value and reflective of real-world knowledge that Hadoop users will require. I don't think the book is suitable as a sole resource for new users of Hadoop. (If that's your case, I'd suggest buying two books-- one to learn the basics, then this one for when you've gone past the newbie phase.) The book is fairly raw-- the chapters seem a little thrown together in places, and the content is short of what the table of contents promises. But you'll get updates with your MEAP purchase, and you can have some valuable content now. All things considered, I'd recommend this book for Hadoop users who are beyond the initial learning phases.

The book can be found here.

Happy Hadooping!

Saturday, February 11, 2012

Powerful Java Hacking made easy - use good judgement!


A very powerful hack - use with caution!

Here's a very quick tutorial on how you can use Byteman to change a running application on the fly. You can insert arbitrary code into a running application server. Caveat CodeMonkeytor!

Let's say we have a JEE 6 application that's misbehaving. You can easily attach Byteman to the app server, then inject the code you want-- without bringing down the server!

I like to use 2 scripts with Byteman, one to attach to the server (since this should be done only once), and one to check and install the Byteman rules I want to run. This way, you can incrementally add to your Byteman rules.

There's an example of a misbehaving Servlet at the bottom of this post. It's bad code. Let's pick on a single fault. The user is asked to provide two numbers, the servlet is supposed to divide them and provide the result. But if the user enters a zero for the second argument, a divide by zero exception results! Please install the servlet and try for yourself.

Making JEE apps is trivial now that JEE 6 is here. I'd recommend using JBoss AS 7, and building with Maven. (I used to hate Maven, but since I've started using it, I hate it a little less. Still, it seems better than Ant in easy cases.) So please find the Maven pom.xml at the bottom of this post, with the Servlet code.

But this is a post about Byteman, so I'm going to put that code first. Here are the scripts:

# InstallByteman.sh - run this only once, after JBoss is already running. It doesn't matter if JBoss has been running 7 months or 7 seconds.

# There should be 4 lines to this script-- in case your browser line-wraps.....
#!/bin/bash
export JBOSS_HOME=/home/rick/Tools/JBoss/AS7/jboss-as-7.1.0.Beta1b
export BYTEMAN_HOME=/home/rick/Tools/Byteman/byteman-2.0
export BYTEMAN_BIN=$BYTEMAN_HOME/bin
$BYTEMAN_BIN/bminstall.sh -b -Dorg.jboss.byteman.transform.all $JBOSS_HOME/jboss-modules.jar

# InstallRules.sh - This script validates and installs your Rules. You can run this as many times as you need, to incrementally build your rules up.

# There should be 10 lines to this script-- in case your browser line-wraps.....

#!/bin/bash
export JBOSS_HOME=/home/rick/Tools/JBoss/AS7/jboss-as-7.1.0.Beta1b
export BYTEMAN_HOME=/home/rick/Tools/Byteman/byteman-2.0
export BYTEMAN_BIN=$BYTEMAN_HOME/bin
# Export this so Byteman can validate your classes. This is wherever you have compiled your classes to.
export APP_TGT=/home/rick/Blog_Temp/Byteman2/simpleappservlet30/target/classes
# check it
$BYTEMAN_BIN/bmcheck.sh -cp $APP_TGT/HackFix.btm
# add the rule
$BYTEMAN_BIN/bmsubmit.sh HackFix.btm

So we can see in the above Install script that we have a Byteman rule named 'HackFix.btm'. Here it is.

(BTW, what this script is doing is changing the second argument passed to the method 'makeDivision' into a 1 if it is a zero. To prevent divide-by-zero...)

RULE Hack Fix for Divide Servlet
CLASS com.flyingdog.SimpleServlet
METHOD makeDivision
AT ENTRY
IF 0 == $2
DO $2 = 1;
ENDRULE

--------------------------------------------------------------------------------------------

So that's all the Byteman stuff above. You can use those samples to attach to a running application server (ANY application server, not just JBoss) and inject whatever arbitrary code you want. Cool (and dangerous), huh?




Ok, if you're like me, you'd love a quick way to try that out. So here's the pom.xml for the SimpleServlet...

 <project xmlns="http://maven.apache.org/POM/4.0.0"  
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.flyingdog</groupId>
<artifactId>notsogoodapp</artifactId>
<packaging>war</packaging>
<version>1.0</version>
<name>notsogoodapp</name>
<url>http://maven.apache.org</url>
<build>
<finalName>notsogoodapp</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>2.1-beta-1</version>
<configuration>
<failOnMissingWebXml>false</failOnMissingWebXml>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>java.net</id>
<url>http://download.java.net/maven/2</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>javax</groupId>
<artifactId>javaee-api</artifactId>
<version>6.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>

Now the bad Servlet.. Install it, have it divide a few numbers. Especially give it a zero to divide by, and notice the ugly blowup! Then use the above Byteman script to 'fix' the problem.


 package com.flyingdog;
import java.io.IOException;
import java.io.PrintWriter;
import java.lang.StringBuffer;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
@WebServlet(urlPatterns = {"/simpleservlet", "*.foo"})
public class SimpleServlet extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request,
HttpServletResponse response) {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request,
HttpServletResponse response) {
try {
response.setContentType("text/html");
PrintWriter printWriter = response.getWriter();
printWriter.println("<h2>");
printWriter.println("</h2>");
// if there is a value there, try to provide an answer
if (null != request.getParameter("firstValue")){
printWriter.println(makeAnswer(request));
}
// print the form that requests numbers
printWriter.println(makeForm());
} catch (IOException ioException) {
ioException.printStackTrace();
}
}
private String makeAnswer(HttpServletRequest request){
String first = request.getParameter("firstValue");
String second = request.getParameter("secondValue");
int iFirst = Integer.parseInt(first);
int iSecond = Integer.parseInt(second);
int answer = makeDivision(iFirst, iSecond);
String response = first + " divided by " + second + " equals " + answer;
return response;
}
private int makeDivision(int first, int second){
return first / second;
}
private String makeForm(){
StringBuffer sb = new StringBuffer();
sb.append("<form method=\"post\" action=\"simpleservlet\">\n");
sb.append("<table cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n");
sb.append(" <tr>\n");
sb.append(" <td>Please enter two numbers to divide:</td>\n");
sb.append(" <td><input type=\"text\" name=\"firstValue\" /></td>\n");
sb.append(" <td><input type=\"text\" name=\"secondValue\" /></td>\n");
sb.append(" </tr>\n");
sb.append(" <tr>\n");
sb.append(" <td></td>\n");
sb.append(" <td><input type=\"submit\" value=\"Submit\"></td>\n");
sb.append(" </tr>\n");
sb.append("</table>\n");
sb.append("</form>\n");
return sb.toString();
}
}


So there you have it. To recap:
  1. Use the pom.xml and the Servlet code to make the .war
  2. Deploy the .war to a JEE 6 app server, like JBoss AS 7.
  3. Access the servlet at http://localhost:8080/notsogoodapp/simpleservlet
  4. Observe how it divides two numbers. Let it try to divide by zero, see blowup.
  5. Run the Byteman scripts, see how blowup stops.

Happy Hacking!