Monday, May 18, 2015

Image processing using OpenCV on Hadoop using HIPI


HIPI meets OpenCV


Problem



This project tries to solve the problem of processing big data of images on Apache Hadoop using Hadoop Image Processing Interface (HIPI) for storing and efficient distributed processing, combined with OpenCV, an open source library of rich image processing algorithms. A program to count number of faces in collection of images is demonstrated.

Background



Processing large set of images on a single machine can be very time consuming and costly. HIPI is an image processing library designed to be used with the Apache Hadoop MapReduce, a software framework for sorting and processing big data in a distributed fashion on large cluster of commodity hardware. HIPI facilitates efficient and high-throughput image processing with MapReduce style parallel programs typically executed on a cluster. It provides a solution for how to store a large collection of images on the Hadoop Distributed File System (HDFS) and make them available for efficient distributed processing.


OpenCV (Open Source Computer Vision) is an open source library of rich image processing algorithms, mainly aimed at real time computer vision. Starting with OpenCV 2.4.4, OpenCV supports Java Development which can be used with Apache Hadoop.

Goal



This project demonstrates how HIPI and OpenCV can be used together to count total number of faces in big image dataset.


Overview of Steps






Big Data Set

Test images for face detection
Input: Image data containing 158 image (34MB)
Format: png image files


Downloaded image were in gif format, I used Mac OSX Preview program to convert these to png format.


Other sources for face detection image datasets:

Technologies Used:



Software Used
Purpose
VMWare Fusion
Software hypervisor for running Cloudera Quickstart VM
Cloudera Quickstart VM
VM for single node Hadoop cluster for testing and running map/reduce programs
IntelliJ IDEA 14 CE
Java IDE for editing and compiling Java code
Hadoop Image Processing Interface (HIPI)
Image processing library designed to be used with the Apache Hadoop MapReduce parallel programming framework, for storing large collection of images on HDFS and efficient distributed processing.
OpenCV
An image processing library aimed at real-time computer vision
Apache Hadoop
Distributed processing of large data sets


References:

Steps

1. Download VMWare Fusion



VMware Fusion is a software hypervisor developed by VMware for computers running OS X with Intel processors. Fusion allows Intel-based Macs to run operating systems such as Microsoft Windows, Linux, NetWare, or Solaris on virtual machines, along with their Mac OS X operating system using a combination of paravirtualization,hardware virtualization and dynamic recompilation. (http://en.wikipedia.org/wiki/VMware_Fusion)


Download and install VMWare Fusion from following URL, this will be used to run Cloudera Quickstart VM.




Screen Shot 2015-05-09 at 11.32.24 AM.png

2. Download and Setup Cloudera Quickstart VM 5.4.x



The Cloudera QuickStart VMs contain a single-node Apache Hadoop cluster, complete with example data, queries, scripts, and Cloudera Manager to manage your cluster. The VMs run CentOS 6.4 and are available for VMware, VirtualBox, and KVM. This will help us gettings started with all the tools needed to run image processing using Hadoop.


Download Cloudera Quickstart VM 5.4.x from following URL. The Quickstart VM 5.4 has Hadoop 2.6 installed on it, which is needed for HIPI.




Open VMWare Fusion and open the VM


Screen Shot 2015-05-10 at 8.46.02 AM.png


3. Getting started with HIPI



Following steps demonstrates how to setup HIPI to run MapReduce job on Apache Hadoop.

Setup Hadoop

The Cloudera Quickstart VM 5.4.x comes pre-installed with Hadoop 2.6 which is needed needed for running HIPI.


Check Hadoop is installed and has correct version:


[cloudera@quickstart Project]$ which hadoop
/usr/bin/hadoop
[cloudera@quickstart Project]$ hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar


Install Apache Ant

Install Apache Ant and check it added to PATH:


[cloudera@quickstart Project]$ which ant
/usr/local/apache-ant/apache-ant-1.9.2/bin/ant


Install and build HIPI

There are two ways to install HIPI
  1. Clone the latest HIPI distribution from GitHub and build from source. (https://github.com/uvagfx/hipi)
  2. Download a precompiled JAR from the downloads page. (http://hipi.cs.virginia.edu/downloads.html)


Clone HIPI GitHub repository


The best way to check and verify that your system is properly setup is to clone the official GitHub repository and build the tools and example programs.


[cloudera@quickstart Project]$ git clone https://github.com/uvagfx/hipi.git
Initialized empty Git repository in /home/cloudera/Project/hipi/.git/
remote: Counting objects: 2882, done.
remote: Total 2882 (delta 0), reused 0 (delta 0), pack-reused 2882
Receiving objects: 100% (2882/2882), 222.33 MiB | 7.03 MiB/s, done.
Resolving deltas: 100% (1767/1767), done.


Download Apache Hadoop tarball
Download Apache Hadoop tarball from following URL and untar it, this is needed to build HIPI.


[cloudera@quickstart Project]$ tar -xvzf /mnt/hgfs/CSCEI63/project/hadoop-2.6.0-cdh5.4.0.tar.gz
[cloudera@quickstart Project]$ ls
hadoop-2.6.0-cdh5.4.0  hipi


Build HIPI binaries


Change directory to hipi repo and build HIPI.


[cloudera@quickstart Project]$ cd hipi/
[cloudera@quickstart hipi]$ ls
3rdparty   data  examples  license.txt  release
build.xml  doc   libsrc    README.md    util


Before building the HIPI, hadoop.home and hadoop.version properties in build.xml file should be updated to the path to Hadoop installation and the version of Hadoop we are using. Change:


build.xml



  <!-- IMPORTANT: You must update the following two properties according to your Hadoop setup -->
   <!-- <property name="hadoop.home" value="/usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop" /> -->
   <!-- <property name="hadoop.version" value="2.6.0" /> -->


to


<!-- IMPORTANT: You must update the following two properties according to your Hadoop setup -->
   <property name="hadoop.home" value="/home/cloudera/Project/hadoop-2.6.0-cdh5.4.0/share/hadoop" />
   <property name="hadoop.version" value="2.6.0-cdh5.4.0" />


Build HIPI using ant


[cloudera@quickstart hipi]$ ant
Buildfile: /home/cloudera/Project/hipi/build.xml
hipi:
   [javac] Compiling 30 source files to /home/cloudera/Project/hipi/lib
     [jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-2.0.jar
    [echo] Hipi library built.


compile:
   [javac] Compiling 1 source file to /home/cloudera/Project/hipi/bin
     [jar] Building jar: /home/cloudera/Project/hipi/examples/covariance.jar
    [echo] Covariance built.


all:


BUILD SUCCESSFUL
Total time: 36 seconds


Make sure it builds tools and examples:


[cloudera@quickstart hipi]$ ls
3rdparty  build.xml  doc       lib     license.txt  release  util
bin       data       examples  libsrc  README.md    tool
[cloudera@quickstart hipi]$ ls tool/
hibimport.jar
[cloudera@quickstart hipi]$ ls examples/
covariance.jar          hipi              runCreateSequenceFile.sh
createsequencefile.jar  jpegfromhib.jar   runDownloader.sh
downloader.jar          rumDumpHib.sh     runJpegFromHib.sh
dumphib.jar             runCovariance.sh  testimages.txt


Sample MapReduce Java Program


Create SampleProgram.java class in “sample” folder to run a simple map/reduce program using sample.hib created in above step:


[cloudera@quickstart hipi]$ mkdir sample
[cloudera@quickstart hipi]$ vi sample/SampleProgram.java


SampleProgram.java



import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;


import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;


import java.io.IOException;


public class SampleProgram extends Configured implements Tool {


  public static class SampleProgramMapper extends Mapper<ImageHeader, FloatImage, IntWritable, FloatImage> {
      public void map(ImageHeader key, FloatImage value, Context context)
              throws IOException, InterruptedException {


          // Verify that image was properly decoded, is of sufficient size, and has three color channels (RGB)
          if (value != null && value.getWidth() > 1 && value.getHeight() > 1 && value.getBands() == 3) {


              // Get dimensions of image
              int w = value.getWidth();
              int h = value.getHeight();


              // Get pointer to image data
              float[] valData = value.getData();


              // Initialize 3 element array to hold RGB pixel average
              float[] avgData = {0,0,0};


              // Traverse image pixel data in raster-scan order and update running average
              for (int j = 0; j < h; j++) {
                  for (int i = 0; i < w; i++) {
                      avgData[0] += valData[(j*w+i)*3+0]; // R
                      avgData[1] += valData[(j*w+i)*3+1]; // G
                      avgData[2] += valData[(j*w+i)*3+2]; // B
                  }
              }


              // Create a FloatImage to store the average value
              FloatImage avg = new FloatImage(1, 1, 3, avgData);


              // Divide by number of pixels in image
              avg.scale(1.0f/(float)(w*h));


              // Emit record to reducer
              context.write(new IntWritable(1), avg);


          } // If (value != null...


      } // map()
  }


  public static class SampleProgramReducer extends Reducer<IntWritable, FloatImage, IntWritable, Text> {
      public void reduce(IntWritable key, Iterable<FloatImage> values, Context context)
              throws IOException, InterruptedException {


          // Create FloatImage object to hold final result
          FloatImage avg = new FloatImage(1, 1, 3);


          // Initialize a counter and iterate over IntWritable/FloatImage records from mapper
          int total = 0;
          for (FloatImage val : values) {
              avg.add(val);
              total++;
          }


          if (total > 0) {
              // Normalize sum to obtain average
              avg.scale(1.0f / total);
              // Assemble final output as string
              float[] avgData = avg.getData();
              String result = String.format("Average pixel value: %f %f %f", avgData[0], avgData[1], avgData[2]);
              // Emit output of job which will be written to HDFS
              context.write(key, new Text(result));
          }


      } // reduce()
  }


  public int run(String[] args) throws Exception {
      // Check input arguments
      if (args.length != 2) {
          System.out.println("Usage: firstprog <input HIB> <output directory>");
          System.exit(0);
      }


      // Initialize and configure MapReduce job
      Job job = Job.getInstance();
      // Set input format class which parses the input HIB and spawns map tasks
      job.setInputFormatClass(ImageBundleInputFormat.class);
      // Set the driver, mapper, and reducer classes which express the computation
      job.setJarByClass(SampleProgram.class);
      job.setMapperClass(SampleProgramMapper.class);
      job.setReducerClass(SampleProgramReducer.class);
      // Set the types for the key/value pairs passed to/from map and reduce layers
      job.setMapOutputKeyClass(IntWritable.class);
      job.setMapOutputValueClass(FloatImage.class);
      job.setOutputKeyClass(IntWritable.class);
      job.setOutputValueClass(Text.class);


      // Set the input and output paths on the HDFS
      FileInputFormat.setInputPaths(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));


      // Execute the MapReduce job and block until it complets
      boolean success = job.waitForCompletion(true);


      // Return success or failure
      return success ? 0 : 1;
  }


  public static void main(String[] args) throws Exception {
      ToolRunner.run(new SampleProgram(), args);
      System.exit(0);
  }


}


Add a new build target to hipi/build.xml to build SampleProgram.java and create a sample.jar library:


build.xml

 <target name="sample">
     <antcall target="compile">
     <param name="srcdir" value="sample" />
     <param name="jarfilename" value="sample.jar" />
     <param name="jardir" value="sample" />
     <param name="mainclass" value="SampleProgram" />
     </antcall>
  </target>
...


Build SampleProgram


[cloudera@quickstart hipi]$ ant sample
Buildfile: /home/cloudera/Project/hipi/build.xml
...
compile:
     [jar] Building jar: /home/cloudera/Project/hipi/sample/sample.jar


BUILD SUCCESSFUL
Total time: 16 seconds


Running a sample HIPI MapReduce Program



Create a sample.hib on HDFS file from sample images provides with HIPI using hibimport tool, this will be the input to MapReduce program.


[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar data/test/ImageBundleTestCase/read/
0.jpg  1.jpg  2.jpg  3.jpg  
[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar data/test/ImageBundleTestCase/read examples/sample.hib
** added: 2.jpg
** added: 3.jpg
** added: 0.jpg
** added: 1.jpg
Created: examples/sample.hib and examples/sample.hib.dat
[cloudera@quickstart hipi]$ hadoop fs -ls examples
Found 2 items
-rw-r--r--   1 cloudera cloudera         80 2015-05-09 22:19 examples/sample.hib
-rw-r--r--   1 cloudera cloudera    1479345 2015-05-09 22:19 examples/sample.hib.dat


Running Hadoop MapReduce program


[cloudera@quickstart hipi]$ hadoop jar sample/sample.jar examples/sample.hib examples/output
15/05/09 23:05:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/09 23:07:05 INFO mapreduce.Job: Job job_1431127776378_0001 running in uber mode : false
15/05/09 23:07:05 INFO mapreduce.Job:  map 0% reduce 0%
15/05/09 23:08:55 INFO mapreduce.Job:  map 57% reduce 0%
15/05/09 23:09:04 INFO mapreduce.Job:  map 100% reduce 0%
15/05/09 23:09:38 INFO mapreduce.Job:  map 100% reduce 100%
15/05/09 23:09:39 INFO mapreduce.Job: Job job_1431127776378_0001 completed successfully
File Output Format Counters
Bytes Written=50


Check the program output:
[cloudera@quickstart hipi]$ hadoop fs -ls examples/output
Found 2 items
-rw-r--r--   1 cloudera cloudera          0 2015-05-09 23:09 examples/output/_SUCCESS
-rw-r--r--   1 cloudera cloudera         50 2015-05-09 23:09 examples/output/part-r-00000


The average pixel value calculated for all the image is:


[cloudera@quickstart hipi]$ hadoop fs -cat examples/output/part-r-00000
1 Average pixel value: 0.420624 0.404933 0.380449


4. Getting started with OpenCV using Java



OpenCV is an image processing library. It contains a large collection of image processing functions. Starting from version 2.4.4 OpenCV includes desktop Java bindings. We will use version 2.4.11 to build Java bindings and use it with HIPI to run image processing.


Download OpenCV source:



The zip bundle for OpenCV 2.4.11 source can be download from following URL and unzip it to ~/Project/opencv directory.


[cloudera@quickstart Project]$ mkdir opencv && cd opencv
[cloudera@quickstart opencv]$ unzip /mnt/hgfs/CSCEI63/project/opencv-2.4.11.zip


CMake build system

CMake is a family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files. CMake generates native makefiles and workspaces that can be used in the compiler environment of your choice. (http://www.cmake.org/)


The OpenCV is built with CMake, download CMake binaries from http://www.cmake.org/download/ and untar the tar bundles to ~/Project/opencv directory:


[cloudera@quickstart opencv]$ tar -xvzf cmake-3.2.2-Linux-x86_64.tar.gz


Build OpenCV for Java



Following steps details how to build OpenCV on linux
Configure OpenCV for builds on Linux


[cloudera@quickstart opencv-2.4.11]$ ../cmake-3.2.2-Linux-x86_64/bin/cmake -DBUILD_SHARED_LIBS=OFF
.
.
.  Target "opencv_haartraining_engine" links to itself.
This warning is for project developers.  Use -Wno-dev to suppress it.


-- Generating done
-- Build files have been written to: /home/cloudera/Project/opencv/opencv-2.4.11


Build OpenCV


[cloudera@quickstart opencv-2.4.11]$ make
.
.
.
[100%] Building CXX object apps/traincascade/CMakeFiles/opencv_traincascade.dir/imagestorage.cpp.o
Linking CXX executable ../../bin/opencv_traincascade
[100%] Built target opencv_traincascade
Scanning dependencies of target opencv_annotation
[100%] Building CXX object apps/annotation/CMakeFiles/opencv_annotation.dir/opencv_annotation.cpp.o
Linking CXX executable ../../bin/opencv_annotation
[100%] Built target opencv_annotation


This will create a jar containing the Java interface (bin/opencv-2411.jar) and a native dynamic library containing Java bindings and all the OpenCV stuff (lib/libopencv_java2411.so). We’ll use these files to build and run OpenCV program.


[cloudera@quickstart opencv-2.4.11]$ ls lib | grep .so
libopencv_java2411.so
[cloudera@quickstart opencv-2.4.11]$ ls bin | grep .jar
opencv-2411.jar


Running OpenCV Face Detection Program



Following steps are run to make sure OpenCV is setup correctly and works as expected.


Create a new directory sample  and Create an ant build.xml file in it.


[cloudera@quickstart opencv]$ mkdir sample && cd sample
[cloudera@quickstart sample]$ vi build.xml

build.xml



<project name="Main" basedir="." default="rebuild-run">


  <property name="src.dir"     value="src"/>


  <property name="lib.dir"     value="${ocvJarDir}"/>
  <path id="classpath">
      <fileset dir="${lib.dir}" includes="**/*.jar"/>
  </path>


  <property name="build.dir"   value="build"/>
  <property name="classes.dir" value="${build.dir}/classes"/>
  <property name="jar.dir"     value="${build.dir}/jar"/>


  <property name="main-class"  value="${ant.project.name}"/>


  <target name="clean">
      <delete dir="${build.dir}"/>
  </target>


  <target name="compile">
      <mkdir dir="${classes.dir}"/>
      <javac includeantruntime="false" srcdir="${src.dir}" destdir="${classes.dir}" classpathref="classpath"/>
  </target>


  <target name="jar" depends="compile">
      <mkdir dir="${jar.dir}"/>
      <jar destfile="${jar.dir}/${ant.project.name}.jar" basedir="${classes.dir}">
          <manifest>
              <attribute name="Main-Class" value="${main-class}"/>
          </manifest>
      </jar>
  </target>


  <target name="run" depends="jar">
      <java fork="true" classname="${main-class}">
          <sysproperty key="java.library.path" path="${ocvLibDir}"/>
          <classpath>
              <path refid="classpath"/>
              <path location="${jar.dir}/${ant.project.name}.jar"/>
          </classpath>
      </java>
  </target>


  <target name="rebuild" depends="clean,jar"/>


  <target name="rebuild-run" depends="clean,run"/>


</project>


Write a program using OpenCV to detect number of faces in an image.


DetectFaces.java



import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Scalar;
import org.opencv.highgui.*;
import org.opencv.core.MatOfRect;
import org.opencv.core.Point;
import org.opencv.core.Rect;
import org.opencv.objdetect.CascadeClassifier;


import java.io.File;


/**
* Created by dmalav on 4/30/15.
*/
public class DetectFaces {


  public void run(String imageFile) {
      System.out.println("\nRunning DetectFaceDemo");


      // Create a face detector from the cascade file in the resources
      // directory.
      String xmlPath = "/home/cloudera/project/opencv-examples/lbpcascade_frontalface.xml";
      System.out.println(xmlPath);
      CascadeClassifier faceDetector = new CascadeClassifier(xmlPath);
      Mat image = Highgui.imread(imageFile);


      // Detect faces in the image.
      // MatOfRect is a special container class for Rect.
      MatOfRect faceDetections = new MatOfRect();
      faceDetector.detectMultiScale(image, faceDetections);


      System.out.println(String.format("Detected %s faces", faceDetections.toArray().length));


      // Draw a bounding box around each face.
      for (Rect rect : faceDetections.toArray()) {
          Core.rectangle(image, new Point(rect.x, rect.y), new Point(rect.x + rect.width, rect.y + rect.height), new Scalar(0, 255, 0));
      }


      File f = new File(imageFile);
      System.out.println(f.getName());
      // Save the visualized detection.
      String filename = f.getName();
      System.out.println(String.format("Writing %s", filename));
      Highgui.imwrite(filename, image);


  }
}


Main.java



import org.opencv.core.Core;


import java.io.File;


public class Main {


  public static void main(String... args) {


      System.loadLibrary(Core.NATIVE_LIBRARY_NAME);


if (args.length == 0) {
  System.err.println("Usage Main /path/to/images");
  System.exit(1);
}


      File[] files = new File(args[0]).listFiles();
      showFiles(files);
  }


  public static void showFiles(File[] files) {
      DetectFaces faces = new DetectFaces();
      for (File file : files) {
          if (file.isDirectory()) {
              System.out.println("Directory: " + file.getName());
              showFiles(file.listFiles()); // Calls same method again.
          } else {
              System.out.println("File: " + file.getAbsolutePath());
              faces.run(file.getAbsolutePath());
          }
      }
  }
}


Build Face Detection Java Program


[cloudera@quickstart sample]$ant -DocvJarDir=/home/cloudera/Project/opencv/opencv-2.4.11/bin -DocvLibDir=/home/cloudera/Project/opencv/opencv-2.4.11/lib jar
Buildfile: /home/cloudera/Project/opencv/sample/build.xml


compile:
   [mkdir] Created dir: /home/cloudera/Project/opencv/sample/build/classes
   [javac] Compiling 2 source files to /home/cloudera/Project/opencv/sample/build/classes


jar:
   [mkdir] Created dir: /home/cloudera/Project/opencv/sample/build/jar
     [jar] Building jar: /home/cloudera/Project/opencv/sample/build/jar/Main.jar


BUILD SUCCESSFUL
Total time: 3 seconds


This build creates a build/jar/Main.jar file which can be used to detect faces from images stored in a directory:


[cloudera@quickstart sample]$ java -cp ../opencv-2.4.11/bin/opencv-2411.jar:build/jar/Main.jar -Djava.library.path=../opencv-2.4.11/lib Main /mnt/hgfs/CSCEI63/project/images2
File: /mnt/hgfs/CSCEI63/project/images2/addams-family.png


Running DetectFaceDemo
/home/cloudera/Project/opencv/sample/lbpcascade_frontalface.xml
Detected 7 faces
addams-family.png
Writing addams-family.png


OpenCV detected faces




OpenCV does fairly good job detecting front facing faces when lbpcascade_frontalface.xml classifier is used. There are other classifier provided by OpenCV which can detect rotate faces and other face orientations.


5. Configure Hadoop for OpenCV



To run OpenCV Java code the native library Core.NATIVE_LIBRARY_NAME must be added to Hadoop java.library.path, following steps details how to setup OpenCV native library with Hadoop.


Copy OpenCV native library libopencv_java2411.so to /etc/opencv/lib


[cloudera@quickstart opencv]$ pwd
/home/cloudera/Project/opencv
[cloudera@quickstart opencv]$ ls
cmake-3.2.2-Linux-x86_64  opencv-2.4.11  sample  test
[cloudera@quickstart opencv]$ sudo cp opencv-2.4.11/lib/libopencv_java2411.so /etc/opencv/lib/


Setup JAVA_LIBRARY_PATH in /usr/lib/hadoop/libexec/hadoop-config.sh file to point to OpenCV native library.


[cloudera@quickstart Project]$ vi  /usr/lib/hadoop/libexec/hadoop-config.sh
.
.
# setup 'java.library.path' for native-hadoop code if necessary


if [ -d "${HADOOP_PREFIX}/build/native" -o -d "${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then


 if [ -d "${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then
   if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
     JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR
   else
     JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR
   fi
 fi
fi


# setup opencv native library path
JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:/etc/opencv/lib


.
.


6. HIPI with OpenCV



This step details Java code for combining HIPI with OpenCV.


HIPI uses HipiImageBundle class to represent collection of images on HDFS, and FloatImage for representing the image in memory. This FloatImage must be converted to OpenCV Mat format for image processing, counting face in this case.


Following method is used to convert FloatImage to Mat:


      // Convert HIPI FloatImage to OpenCV Mat
      public Mat convertFloatImageToOpenCVMat(FloatImage floatImage) {


          // Get dimensions of image
          int w = floatImage.getWidth();
          int h = floatImage.getHeight();


          // Get pointer to image data
          float[] valData = floatImage.getData();


          // Initialize 3 element array to hold RGB pixel average
          double[] rgb = {0.0,0.0,0.0};


          Mat mat = new Mat(h, w, CvType.CV_8UC3);


          // Traverse image pixel data in raster-scan order and update running average
          for (int j = 0; j < h; j++) {
              for (int i = 0; i < w; i++) {
                  rgb[0] = (double) valData[(j*w+i)*3+0] * 255.0; // R
                  rgb[1] = (double) valData[(j*w+i)*3+1] * 255.0; // G
                  rgb[2] = (double) valData[(j*w+i)*3+2] * 255.0; // B
                  mat.put(j, i, rgb);
              }
          }


          return mat;
      }


To count the number of faces from an image we need to create a CascadeClassifier which uses a classifier file. This file must be present on HDFS, this can be accomplished by using Job.addCacheFile method and later retrieve it in Mapper class.


    public int run(String[] args) throws Exception {
     ....


      // Initialize and configure MapReduce job
      Job job = Job.getInstance();
      
      ....
      
      // add cascade file
      job.addCacheFile(new URI("/user/cloudera/lbpcascade_frontalface.xml#lbpcascade_frontalface.xml"));


      // Execute the MapReduce job and block until it complets
      boolean success = job.waitForCompletion(true);


      // Return success or failure
      return success ? 0 : 1;
  }


Override Mapper::setup method to load OpenCV native library and create CascadeClassifier for face detections:


public static class FaceCountMapper extends Mapper<ImageHeader, FloatImage, IntWritable, IntWritable> {


      // Create a face detector from the cascade file in the resources
      // directory.
      private CascadeClassifier faceDetector;


      
      public void setup(Context context)
              throws IOException, InterruptedException {


          // Load OpenCV native library
          try {
              System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
          } catch (UnsatisfiedLinkError e) {
              System.err.println("Native code library failed to load.\n" + e + Core.NATIVE_LIBRARY_NAME);
              System.exit(1);
          }


          // Load cached cascade file for front face detection and create CascadeClassifier
          if (context.getCacheFiles() != null && context.getCacheFiles().length > 0) {
              URI mappingFileUri = context.getCacheFiles()[0];


              if (mappingFileUri != null) {
                  faceDetector = new CascadeClassifier("./lbpcascade_frontalface.xml");


              } else {
                  System.out.println(">>>>>> NO MAPPING FILE");
              }
          } else {
              System.out.println(">>>>>> NO CACHE FILES AT ALL");
          }


          super.setup(context);
      } // setup()
      ....
}


Full listing of FaceCount.java.


Mapper:
  1. Load OpenCV native library
  2. Create CascadeClassifier
  3. Convert HIPI FloatImage to OpenCV Mat
  4. Detect and count faces in the image
  5. Write number of faces detected to context


Reducer:
  1. Count number of files processed
  2. Count number of faces detected
  3. Output number of files and faces detected


FaceCount.java



import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;


import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;


import org.opencv.core.*;
import org.opencv.objdetect.CascadeClassifier;


import java.io.IOException;
import java.net.URI;


public class FaceCount extends Configured implements Tool {


  public static class FaceCountMapper extends Mapper<ImageHeader, FloatImage, IntWritable, IntWritable> {


      // Create a face detector from the cascade file in the resources
      // directory.
     private CascadeClassifier faceDetector;


      // Convert HIPI FloatImage to OpenCV Mat
      public Mat convertFloatImageToOpenCVMat(FloatImage floatImage) {


          // Get dimensions of image
          int w = floatImage.getWidth();
          int h = floatImage.getHeight();


          // Get pointer to image data
          float[] valData = floatImage.getData();


          // Initialize 3 element array to hold RGB pixel average
          double[] rgb = {0.0,0.0,0.0};


          Mat mat = new Mat(h, w, CvType.CV_8UC3);


          // Traverse image pixel data in raster-scan order and update running average
          for (int j = 0; j < h; j++) {
              for (int i = 0; i < w; i++) {
                  rgb[0] = (double) valData[(j*w+i)*3+0] * 255.0; // R
                  rgb[1] = (double) valData[(j*w+i)*3+1] * 255.0; // G
                  rgb[2] = (double) valData[(j*w+i)*3+2] * 255.0; // B
                  mat.put(j, i, rgb);
              }
          }


          return mat;
      }


      // Count faces in image
      public int countFaces(Mat image) {


          // Detect faces in the image.
          // MatOfRect is a special container class for Rect.
          MatOfRect faceDetections = new MatOfRect();
          faceDetector.detectMultiScale(image, faceDetections);


          return faceDetections.toArray().length;
      }


      public void setup(Context context)
              throws IOException, InterruptedException {


          // Load OpenCV native library
          try {
              System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
          } catch (UnsatisfiedLinkError e) {
              System.err.println("Native code library failed to load.\n" + e + Core.NATIVE_LIBRARY_NAME);
              System.exit(1);
          }


          // Load cached cascade file for front face detection and create CascadeClassifier
          if (context.getCacheFiles() != null && context.getCacheFiles().length > 0) {
              URI mappingFileUri = context.getCacheFiles()[0];


              if (mappingFileUri != null) {
                  faceDetector = new CascadeClassifier("./lbpcascade_frontalface.xml");


              } else {
                  System.out.println(">>>>>> NO MAPPING FILE");
              }
          } else {
              System.out.println(">>>>>> NO CACHE FILES AT ALL");
          }


          super.setup(context);
      } // setup()


      public void map(ImageHeader key, FloatImage value, Context context)
              throws IOException, InterruptedException {


          // Verify that image was properly decoded, is of sufficient size, and has three color channels (RGB)
          if (value != null && value.getWidth() > 1 && value.getHeight() > 1 && value.getBands() == 3) {


              Mat cvImage = this.convertFloatImageToOpenCVMat(value);


              int faces = this.countFaces(cvImage);


              System.out.println(">>>>>> Detected Faces: " + Integer.toString(faces));


              // Emit record to reducer
              context.write(new IntWritable(1), new IntWritable(faces));


          } // If (value != null...


      } // map()
  }


  public static class FaceCountReducer extends Reducer<IntWritable, IntWritable, IntWritable, Text> {
      public void reduce(IntWritable key, Iterable<IntWritable> values, Context context)
              throws IOException, InterruptedException {


           // Initialize a counter and iterate over IntWritable/FloatImage records from mapper
          int total = 0;
          int images = 0;
          for (IntWritable val : values) {
              total += val.get();
              images++;
          }


          String result = String.format("Total face detected: %d", total);
          // Emit output of job which will be written to HDFS
          context.write(new IntWritable(images), new Text(result));
      } // reduce()
  }


  public int run(String[] args) throws Exception {
      // Check input arguments
      if (args.length != 2) {
          System.out.println("Usage: firstprog <input HIB> <output directory>");
          System.exit(0);
      }


      // Initialize and configure MapReduce job
      Job job = Job.getInstance();
      // Set input format class which parses the input HIB and spawns map tasks
      job.setInputFormatClass(ImageBundleInputFormat.class);
      // Set the driver, mapper, and reducer classes which express the computation
      job.setJarByClass(FaceCount.class);
      job.setMapperClass(FaceCountMapper.class);
      job.setReducerClass(FaceCountReducer.class);
      // Set the types for the key/value pairs passed to/from map and reduce layers
      job.setMapOutputKeyClass(IntWritable.class);
      job.setMapOutputValueClass(IntWritable.class);
      job.setOutputKeyClass(IntWritable.class);
      job.setOutputValueClass(Text.class);


      // Set the input and output paths on the HDFS
      FileInputFormat.setInputPaths(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));


      // add cascade file
      job.addCacheFile(new URI("/user/cloudera/lbpcascade_frontalface.xml#lbpcascade_frontalface.xml"));


      // Execute the MapReduce job and block until it complets
      boolean success = job.waitForCompletion(true);


      // Return success or failure
      return success ? 0 : 1;
  }


  public static void main(String[] args) throws Exception {


      ToolRunner.run(new FaceCount(), args);
      System.exit(0);
  }


}


7. Build FaceCount.java as facecount.jar



Create new facecount directory in hipi folder (where HIPI was built) and copy FaceCount.java from previous step.


[cloudera@quickstart hipi]$ pwd
/home/cloudera/Project/hipi
[cloudera@quickstart hipi]$ mkdir facecount
[cloudera@quickstart hipi]$ cp /mnt/hgfs/CSCEI63/project/hipi/src/FaceCount.java facecount/
[cloudera@quickstart hipi]$ ls
3rdparty   data      facecount    libsrc       README.md  sample
bin        doc       hipiwrapper  license.txt  release    tool
build.xml  examples  lib          my.diff      run.sh     util
[cloudera@quickstart hipi]$ ls facecount/
FaceCount.java


Make changes to HIPI build.xml ant script to build link to OpenCV jar file and add new build target facecount.


build.xml



<project basedir="." default="all">


<target name="setup">


....


  <!-- opencv dependencies -->
  <property name="opencv.jar" value="../opencv/opencv-2.4.11/bin/opencv-2411.jar" />


  <echo message="Properties set."/>
</target>


<target name="compile" depends="setup,test_settings,hipi">


  <mkdir dir="bin" />
 
  <!-- Compile -->
  <javac debug="yes" nowarn="on" includeantruntime="no" srcdir="${srcdir}" destdir="./bin" classpath="${hadoop.classpath}:./lib/hipi-${hipi.version}.jar:${opencv.jar}">
    <compilerarg value="-Xlint:deprecation" />
  </javac>
 
  <!-- Create the jar -->
  <jar destfile="${jardir}/${jarfilename}" basedir="./bin">
    <zipfileset src="./lib/hipi-${hipi.version}.jar" />
    <zipfileset src="${opencv.jar}" />
    <manifest>
        <attribute name="Main-Class" value="${mainclass}" />
    </manifest>
  </jar>
 
</target>
....


 <target name="facecount">
    <antcall target="compile">
    <param name="srcdir" value="facecount" />
    <param name="jarfilename" value="facecount.jar" />
    <param name="jardir" value="facecount" />
    <param name="mainclass" value="FaceCount" />
    </antcall>
 </target>


 <target name="all" depends="hipi,hibimport,downloader,dumphib,jpegfromhib,createsequencefile,covariance" />


 <!-- Clean -->
 <target name="clean">
   <delete dir="lib" />
   <delete dir="bin" />
   <delete>
     <fileset dir="." includes="examples/*.jar,experiments/*.jar" />
   </delete>
 </target>
</project>


Build FaceCount.java


[cloudera@quickstart hipi]$ ant facecount
Buildfile: /home/cloudera/Project/hipi/build.xml


facecount:


setup:
    [echo] Setting properties for build task...
    [echo] Properties set.


test_settings:
    [echo] Confirming that hadoop settings are set...
    [echo] Properties are specified properly.


hipi:
    [echo] Building the hipi library...


hipi:
   [javac] Compiling 30 source files to /home/cloudera/Project/hipi/lib
     [jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-2.0.jar
    [echo] Hipi library built.


compile:
     [jar] Building jar: /home/cloudera/Project/hipi/facecount/facecount.jar


BUILD SUCCESSFUL
Total time: 12 seconds


Check facecount.jar is built under facecount directory


[cloudera@quickstart hipi]$ ls facecount/
facecount.jar  FaceCount.java


8. Run FaceCount MapReduce job



Setup input images



[cloudera@quickstart hipi]$ ls /mnt/hgfs/CSCEI63/project/images-png/
217.png               eugene.png                    patio.png
221.png               ew-courtney-david.png         people.png
3.png                 ew-friends.png                pict_28.png
….


Create HIB



The primary input type to a HIPI program is a HipiImageBundle (HIB), which stores a collection of images on the Hadoop Distributed File System (HDFS). Use the hibimport tool to create a HIB (project/input.hib) from a collection of images on your local file system located in the directory /mnt/hgfs/CSCEI63/project/images-png/  by executing the following command from the HIPI root directory


[cloudera@quickstart hipi]$ hadoop fs -mkdir project
[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar /mnt/hgfs/CSCEI63/project/images-png/ project/input.hib
** added: 217.png
** added: 221.png
** added: 3.png
** added: addams-family.png
** added: aeon1a.png
** added: aerosmith-double.png
.
.
.
** added: werbg04.png
** added: window.png
** added: wxm.png
** added: yellow-pages.png
** added: ysato.png
Created: project/input.hib and project/input.hib.dat


Run MapReduce



Create a run-facecount.sh script to clean previous output file and execute mapreduce job:


run-facecount.sh



#!/bin/bash
hadoop fs -rm -R project/output
hadoop jar facecount/facecount.jar project/input.hib project/output


[cloudera@quickstart hipi]$ bash run-facecount.sh
15/05/12 16:48:06 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted project/output
15/05/12 16:48:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/12 16:48:20 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/05/12 16:48:21 INFO input.FileInputFormat: Total input paths to process : 1
Spawned 1map tasks
15/05/12 16:48:22 INFO mapreduce.JobSubmitter: number of splits:1
15/05/12 16:48:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431127776378_0049
15/05/12 16:48:25 INFO impl.YarnClientImpl: Submitted application application_1431127776378_0049
15/05/12 16:48:25 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1431127776378_0049/
15/05/12 16:48:25 INFO mapreduce.Job: Running job: job_1431127776378_0049
15/05/12 16:48:58 INFO mapreduce.Job: Job job_1431127776378_0049 running in uber mode : false
15/05/12 16:48:58 INFO mapreduce.Job:  map 0% reduce 0%
15/05/12 16:49:45 INFO mapreduce.Job:  map 3% reduce 0%
15/05/12 16:50:53 INFO mapreduce.Job:  map 5% reduce 0%
15/05/12 16:50:57 INFO mapreduce.Job:  map 8% reduce 0%
15/05/12 16:51:01 INFO mapreduce.Job:  map 11% reduce 0%
15/05/12 16:51:09 INFO mapreduce.Job:  map 15% reduce 0%
15/05/12 16:51:13 INFO mapreduce.Job:  map 22% reduce 0%
15/05/12 16:51:18 INFO mapreduce.Job:  map 25% reduce 0%
15/05/12 16:51:21 INFO mapreduce.Job:  map 28% reduce 0%
15/05/12 16:51:29 INFO mapreduce.Job:  map 31% reduce 0%
15/05/12 16:51:32 INFO mapreduce.Job:  map 33% reduce 0%
15/05/12 16:51:45 INFO mapreduce.Job:  map 38% reduce 0%
15/05/12 16:51:57 INFO mapreduce.Job:  map 51% reduce 0%
15/05/12 16:52:07 INFO mapreduce.Job:  map 55% reduce 0%
15/05/12 16:52:10 INFO mapreduce.Job:  map 58% reduce 0%
15/05/12 16:52:14 INFO mapreduce.Job:  map 60% reduce 0%
15/05/12 16:52:18 INFO mapreduce.Job:  map 63% reduce 0%
15/05/12 16:52:26 INFO mapreduce.Job:  map 100% reduce 0%
15/05/12 16:52:59 INFO mapreduce.Job:  map 100% reduce 100%
15/05/12 16:53:01 INFO mapreduce.Job: Job job_1431127776378_0049 completed successfully
15/05/12 16:53:02 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1576
FILE: Number of bytes written=226139
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=35726474
HDFS: Number of bytes written=27
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=205324
Total time spent by all reduces in occupied slots (ms)=29585
Total time spent by all map tasks (ms)=205324
Total time spent by all reduce tasks (ms)=29585
Total vcore-seconds taken by all map tasks=205324
Total vcore-seconds taken by all reduce tasks=29585
Total megabyte-seconds taken by all map tasks=210251776
Total megabyte-seconds taken by all reduce tasks=30295040
Map-Reduce Framework
Map input records=157
Map output records=157
Map output bytes=1256
Map output materialized bytes=1576
Input split bytes=132
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=1576
Reduce input records=157
Reduce output records=1
Spilled Records=314
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=11772
CPU time spent (ms)=45440
Physical memory (bytes) snapshot=564613120
Virtual memory (bytes) snapshot=3050717184
Total committed heap usage (bytes)=506802176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=35726342
File Output Format Counters
Bytes Written=27


Check results:


[cloudera@quickstart hipi]$ hadoop fs -ls project/output
Found 2 items
-rw-r--r--   1 cloudera cloudera          0 2015-05-12 16:52 project/output/_SUCCESS
-rw-r--r--   1 cloudera cloudera         27 2015-05-12 16:52 project/output/part-r-00000
[cloudera@quickstart hipi]$ hadoop fs -cat project/output/part-r-00000
157 Total face detected: 0


9. Summary



OpenCV provides very rich set of tools for image processing, when combined with HIPI’s efficient and high-throughput parallel image processing power can be a great solution for processing very large image dataset very fast. These tools can help researcher and engineers alike to achieve high performance image processing.


10. Issues



The wrapper function to convert HIPI FloatImage to OpenCV did not work for some reason and not producing the correct image when converted. This was causing a bug of no faces detected. I had contacted HIPI members but did not receive timely reply before finishing this project. This bug is causing my results to show “0” faces detected.


11. Benefits (Pros/Cons)



Pros: HIPI is a great tool for processing very large volume of images in hadoop cluster, when combined with OpenCV it can be very powerful.
Cons: Converting the image format (HIPI FloatImage) to OpenCV Mat format is not straightforward and causing the issues for OpenCV to process images correctly.


12. Lessons Learned


  1. Setting up Cloudera Quickstart VM
  2. Using HIPI to run MapReduce on large volume of images
  3. Using OpenCV in Java environment
  4. Settings up hadoop to load native libraries
  5. Using cached files on HDFS

164 comments:

  1. yes, I like it. I have question for Dinesh Malav. You can use SIFTDectector (Opencv) to find the same image on Hadoop using HIPI?

    ReplyDelete
  2. I am not familiar with SIFTDectector but a wrapper to convert HIPI FloatImage to any required format can be written and used with OpenCV.

    ReplyDelete
    Replies
    1. Hi Dinesh Malav,

      Subject: Image matching with hipi

      I want to know whether is it possible to search a image in the bundle that match to the image I give. Thanks

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  4. Hello Dinesh

    Did you try to run this MapReduce prog. in Hipi using eclipse by having separate classes like driver, mapper and reducer. If yes, please give me the steps of eclipse. Creating with single file.java works fine with me.. any help!

    -Prasad

    ReplyDelete
  5. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training
    Thank you for sharing Such a good tutorials on Hadoop Image Processing

    ReplyDelete
  6. Apart from learning more about Hadoop at hadoop online training, this blog adds to my learning platforms. Great work done by the webmasters. Thanks for your research and experience sharing on a platform like this.

    ReplyDelete
  7. Some topics covered,may be it helps someone,HDFS is a Java-based file system that provides scalable and reliable data storage,and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
    http://www.computaholics.in/2015/12/hdfs.html
    http://www.computaholics.in/2015/12/mapreduce.html
    http://www.computaholics.in/2015/11/hadoop-fs-commands.html

    ReplyDelete
  8. hi,dinesh
    i have one project in which i have to give image of one person it will search into a video frame by frame,match and give the result like count of face detection,timing of appearance etc. so is it possible with HiPi and opencv??

    ReplyDelete
  9. I have my hipi with build.gradle file..!!! Where do i need to specify the opencv dependencies in hipi instead of build.xml........??????

    ReplyDelete
    Replies
    1. did you get any clue for this?

      Delete
    2. Well i am too facing same issue..did anyone find solution?

      Delete
    3. I think it is because you are using latest HIPI version, Dinesh used HIPI 2.0.
      I don't have solution to fix it though...

      Delete
  10. Does the bug u mentioned is solved

    ReplyDelete
  11. Hey Dinesh, nice tutorial. Very helpful.
    Can you help a bit more. I am getting problem with opencv native library. The library is loaded. But still I am getting the error:

    Exception in thread "main" java.lang.UnsatisfiedLinkError: org.opencv.core.Mat.n_Mat()J
    at org.opencv.core.Mat.n_Mat(Native Method)
    at org.opencv.core.Mat.(Mat.java:24)
    at xugglerTest.readVideoFile.run(readVideoFile.java:100)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at xugglerTest.readVideoFile.main(readVideoFile.java:113)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Hey Dinesh,very nice Cool tutorial!!
    So,could u help bit more to send me a copy of ur demo code.pretty much thanks for you.(fengbosapphire@foxmail.com)!

    ReplyDelete
  14. Hi,
    I'm try to run the jar file but i got exception from container failure.i couldn't fix the error. plz any one help me to resolve this error.....

    ReplyDelete
  15. hi , Does HIPI use datnode , namenode , job tracker and tasktracker ?

    ReplyDelete
  16. hi , Does HIPI use datnode , namenode , job tracker and tasktracker after culling step?

    If yes what would be the flow of processing?

    ReplyDelete
  17. Hi Dinesh,
    Nice tutorial.
    Does HIPI work on Hadoop-1.2.1 ? As i am new to Hadoop I installed basic version Hadoop-1.2.1. So,can I start installing HIPI with this or do I need to install Hadoop-2.6.0.

    ReplyDelete
  18. This comment has been removed by the author.

    ReplyDelete
  19. Hi Dinesh,

    Thanks for sharing a good tutorial for Hadoop, Hipi and OpenCV.

    I am getting the following error while running my program on Hadoop with Hipi:


    Exception in thread "main" java.lang.NoClassDefFoundError: hipi/imagebundle/mapreduce/ImageBundleInputFormat at AveragePixelColor.run(AveragePixelColor.java:114) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at AveragePixelColor.main(AveragePixelColor.java:144) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

    Caused by: java.lang.ClassNotFoundException: hipi.imagebundle.mapreduce.ImageBundleInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 10 more


    Hipi JAR file is already included in HADOOP_CLASSPATH and I tried providing the JAR using -libjars as well. No success. Do you have any idea of resolving it.

    Regards,

    ReplyDelete
    Replies
    1. Try this Workaround: sudo cp ~/Downloads/hipi-2.0.jar /usr/lib/hadoop/

      Delete
  20. Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog.


    Chennai Bigdata Training

    ReplyDelete
  21. hello sir,
    thanks for this awesome tutorial.
    can you please send me the complete build.xml file for HIPI which is shown in step 7 or if possible can you please tell us how to build this FaceCount.java using gradle since the latest github repo of HIPI doesn't contain build.xml. (my mail id: bscniitmunna@gmail.com)

    ReplyDelete
  22. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  23. This article describes the Hadoop Software, All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. This post gives great idea on Hadoop Certification for beginners. Also find best Hadoop Online Training in your locality at StaygreenAcademy.com

    ReplyDelete
  24. Hi Dinesh,

    Issues section describes that the number of faced detected seem to be zero due to some HIPI related errors. Is there any way that we can resolve this. We are trying to build the same use case here. It would be really helpful if we can get a solution to this at the earliest.

    ReplyDelete
    Replies
    1. Hi Afzal,
      I have my hipi with build.gradle file instead of build.xml.Am i missing something.......??????

      Delete
  25. Thanks for providing this informative information you may also refer.
    http://www.s4techno.com/blog/2016/08/13/installing-a-storm-cluster/

    ReplyDelete
  26. This comment has been removed by the author.

    ReplyDelete
  27. I just did you example and It works, I don't know if hipi team has changed something but I'm getting the found faces! Thank you so much.

    I use open cv 3.0.0 instead...

    ReplyDelete
  28. onlineitguru offers job oriented Big data big data hadoop online training and Certification

    Course and become Expert in big data hadoop .

    ReplyDelete
  29. The question was hypothetical. There was no specific job I was thinking of. But after you saying that databases like greenplum allows mixing of map reduce code and sql queries, it suddenly dawned to me that my database might be doing the same as well. But just to know your thoughts because I don’t know, I am currently using MongoDB, do you know if it optimizes like Greenplum does?

    hadoop training in chennai

    ReplyDelete
  30. Thanks for this valuable info. I was going through all the your ppt one of the recommanded ppt for all hadoop learners in

    Hadoop training

    Hadoop Online Training in india|usa|uk

    ReplyDelete
  31. Thanks in advance for the article. However, the VM link is no longer available.

    ReplyDelete
  32. This comment has been removed by the author.

    ReplyDelete
  33. Buildfile: /home/faisal/hipi/sample/build.xml

    BUILD FAILED
    /home/faisal/hipi/sample/build.xml:1: Unexpected element "{}target" {antlib:org.apache.tools.ant}target

    Total time: 0 seconds



    hi i got this error can any one help

    ReplyDelete
  34. These instructions are 2 years old, please use new libraries. I will try to update instruction in with newer libraries.

    ReplyDelete
    Replies
    1. can you direct me to the new libraries or a send a link to new way
      thanks a lot

      Delete
  35. The blog you shared is really good.
    Should be a Big Data Hadoop Developer? Join TechandMate for the Big Data Hadoop Online Tutorial and learn MapReduce structure, HDFS thoughts, Hadoop Cluster, Sqoop, Flume, Hive, Pig and YARN. https://goo.gl/6RHMF2

    ReplyDelete
  36. http://worldofbigdata-inaction.blogspot.in/2017/02/processing-images-in-hadoop-using.html

    This is my blog and explains another issue that I felt while using HIPI and solution to overcome the same

    ReplyDelete
  37. @ Dinesh - Can you please provide any other methods which can be used for image processing in hadoop?

    ReplyDelete
  38. Webtrackker technology is the best IT training institute in NCR. Webtrackker provide training on all latest technology such as hadoop training. Webtrackker is not only training institute but also it also provide best IT solution to his client. Webtrackker provide training by experienced and working in the industry on same technology.Webtrackker Technology C-67 Sector-63 Noida 8802820025

    Hadoop Training institute in indirapuram


    Hadoop Training institute in Noida


    Hadoop Training institute in Ghaziabad


    Hadoop Training institute in Vaishali


    Hadoop Training institute in Vasundhara


    Hadoop Training institute in Delhi South Ex

    ReplyDelete
  39. thank you for sharing such a valuable information.we are very thankful to your articles we are provideing hadoop training.nice blog one of the recommanded blog

    Hadoop Training in hyderabad
    Hadoop Training in ameerpet
    Hadoop Training institute in ameerpet
    Hadoop certification in hyderabad

    ReplyDelete
  40. thank you for sharing this informative blog.. this blog really helpful for everyone.. explanation are clear so easy to understand... I got more useful information from this blog

    hadoop training institute in velachery | big data training institute in velachery | hadoop training in chennai velachery | big data training in chennai velachery

    ReplyDelete
  41. After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog

    hadoop training institute in velachery | big data training institute in velachery | hadoop training in chennai velachery | big data training in chennai velachery

    ReplyDelete
  42. Thanks for sharing the information very useful info about Hadoop and

    keep updating us, Please........

    ReplyDelete
  43. thank for you sharing your blog ,excellent blog good idea big data hadoop
    Hadoop Training In Hyderabad

    ReplyDelete
  44. Thanks for sharing Valuable information about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials

    ReplyDelete
  45. Excellent Information great blog thanks for sharing ..... it is helpful to all big data learners and real time employees.
    Hadoop Online Training

    ReplyDelete
  46. Thanks for sharing the valuable information here. So i think i got some useful information with this content. Thank you and please keep update like this informative details.

    Hadoop Training in Chennai

    Base SAS Training in Chennai

    ReplyDelete
  47. I agree with your posts that the Employee management i think this was most important among those points you are mentioning here. If there is no problem for management there will be sure productivity from them. Thank you fr sharing this nice information in which H to be followed. Really nice and informtive

    Software Testing Training in Chennai

    Web Designing Training in Chennai

    Java Training in Chennai

    ReplyDelete
  48. the article provided by you is very nice and it is very helpful to know about the hadoop ..i read a article related to you..once you can check it out
    Hadoop Admin Online Training Hyderabad,Bangalore


    ReplyDelete
  49. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..

    Web Designing Training in Chennai

    Java Training in Chennai

    ReplyDelete


  50. I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

    ReplyDelete
  51. Before choosing a Job Oriented Training program it is important to evaluate your skills, interests, strength and weakness. Job Oriented Courses enable you to get a identity once you finish the same. Choose eNvent software Technology that suits you and make your career worthwhile.

    ReplyDelete
  52. Your thinking toward the respective issue is awesome also the idea behind the blog is very interesting which would bring a new evolution in respective field. Thanks for sharing.

    Java Training in Chennai

    VMWare Training in Chennai

    ReplyDelete
  53. These provided information was really so nice,thanks for giving that post and the more skills to develop after refer that post. Your articles really impressed for me,because of all information so nice.


    Hadoop Training in Chennai

    Base SAS Training in Chennai


    ReplyDelete
  54. Besant Technologies conducts exams for an array of companies including Microsoft, Cisco, EXIN, Citrix, HP Expertone, Solaris, Linux, Sun Java and many more. As an authorized Pearson Vue Exam Center in Bangalore.
    Get the practice and confidence you need with our new Adobe, Oracle , Linux, Pegasystems, CompTIA, Nokia Siemens, EMC, Cloudera, NetApp, Zend Technologies to name a few Test bundles. Pearson VUE offers affordable exam preparation bundles for select any certification exams — saving you up to 30% compared to buying separately. Each bundle consists of an exam paired with a corresponding practice test. The bundles can be conveniently purchased when you register for your exam. Pearson Vue Exam Center in Bangalore |
    Pearson Vue Exam Centers in Bangalore |

    ReplyDelete
  55. Here i had read the content you had posted. It will be enabled for those who are just beginning to SEO. It is much interesting so please keep update like this.

    Hadoop Training in Chennai

    Base SAS Training in Chennai

    ReplyDelete
  56. Sas Training Institute in Noida- Webtrackker is the best SAS preparing focus in Noida with an abnormal state foundation and research center office. The most interesting thing is that hopefuls can select numerous IT instructional classes at Noida area, SAS certification at Noida is a great step for a highly rewarding career as a programmer, analyst, consultant or SAS developer. In this vision, webtrackker was developed by SAS Institute Inc. The United States is the SAS certification one of the most reliable industrial certifications.
    Sap Training Institute in Noida
    PHP Training Institute in Noida
    Hadoop Training Institute in Noida
    Oracle Training Institute in Noida
    Linux Training Institute in Noida
    Dot net Training Institute in Noida
    Salesforce training institute in noida
    Java training institute in noida

    ReplyDelete
  57. Revanth Technologies is a vast experienced online training center in Hyderabad, India since 2006, with highly qualified and real time experienced faculties, offers Python online training with real time project scenarios.

    In the course training we are covering Types and Operations,Statements and Syntax,Functions,Modules,Classes and OOP, Exceptions and Tools etc..

    For more details please contact: 9290971883
    Mail id: revanthonlinetraining@gmail.com


    For course content and more details please visit
    http://www.revanthtechnologies.com/python-online-training-from-india.php

    ReplyDelete
  58. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts, have a nice weekend! Besant Technologies offers the best Hadoop Training in Bangalore with the guide of the most gifted and all around experienced experts. Our educators are working in Hadoop and related innovations for a significant number of years in driving multi-national organizations around the globe.

    ReplyDelete
  59. And indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.


    Android training in bangalore

    ReplyDelete
  60. Finest Blog I ever Seen on Big Data Hadoop. Please Maintain Good Trending Post For Making Us Update in Hadoop.
    Hadoop Training In Bangalore.

    ReplyDelete
  61. Your thinking toward the respective issue is awesome also the idea behind the blog is very interesting which would bring a new evolution in respective field. Thanks for sharing.

    Hadoop Training in Chennai

    Base SAS Training in Chennai

    MSBI Training in Chennai

    ReplyDelete
  62. This comment has been removed by a blog administrator.

    ReplyDelete
  63. Thanks for sharing this- good stuff.Keep up the great work.

    Thank you
    Big Data Training in Hyderabad

    ReplyDelete
  64. Hi,
    Keep on posting these types of Hadoop articles. Thanks for information .
    Hadoop Training in Hyderabad

    ReplyDelete
  65. Hi,
    Thanks for sharing such a greatful information on Hadoop
    We are expecting more articles from you
    Thank you
    Big Data Analytics Training In Hyderabad
    Big Data Analytics Course In Hyderabad

    ReplyDelete
  66. Hai,
    It's very nice blog
    Thank you for giving valuable information on Hadoop
    I'm expecting much more from you...

    ReplyDelete
  67. Nice post about hadoop, are you looking for best hadoop online training .

    ReplyDelete
  68. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  69. Hi,

    I have read your Hadoop blog it's very attractive and impressive.I like it your blog.....
    Than you
    priya

    ReplyDelete
  70. Linux Online training in India – Webtrackker Technology is providing the linux online training with 100% placement support. If you are looking for the BEST LINUX & UNIX Training Institute In india or linux online training from india, live project based LINUX & UNIX online training then you can contact to us.

    Python online training in India, RPA Online training in India, Salesforce online training in india, AWS online training in india, Cloud Computing Online Training in India, SAS Online Training in india, Hadoop online training in INDIA, Oracle DBA online training in India, SAP online Training In india, Linux Online training in India








    ReplyDelete
  71. This comment has been removed by the author.

    ReplyDelete
  72. Hi Malav, Great tutorial,

    I have an IllegalArgumentException while running a mapreduce job:

    hadoop jar facecount/facecount.jar HipiprojectHib/TestImageFace.hib project/output

    Brief Error log:

    18/03/29 09:27:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/abiodun/.staging/job_1522223364790_0024
    Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
    at org.apache.hadoop.fs.Path.(Path.java:175)
    at org.apache.hadoop.fs.Path.(Path.java:120)
    at hipi.imagebundle.HipiImageBundle.readBundleHeader(HipiImageBundle.java:364)


    I will appreciate any suggestion to help resolve this issue. Many thanks

    ReplyDelete
  73. This comment has been removed by a blog administrator.

    ReplyDelete
  74. This really has covered a great insight on Python. I found myself lucky to visit your page and came across this insightful read on Python tutorial. Please allow me to share similar work on Python training course . Watch and gain knowledge today.https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete
  75. Hello friends, my name is Rohit and I work as the head of digital marketing in Delhi. I am affiliated with many MNC’s Software developers. If you are talking about the best educational institution in Delhi,Webtrackker help me get the best educational institute in Delhi.we are you offering some best services in our institute.with 100% job offers are available .



    Webtrackker is one only IT company who will provide you best class training with real time working on marketing from last 4 to 8 Years Experience Employee. We make you like a strong technically sound employee with our best class training.


    WEBTRACKKER TECHNOLOGY (P) LTD.
    C - 67, sector- 63, Noida, India.
    F -1 Sector 3 (Near Sector 16 metro station) Noida, India.
    +91 - 8802820025
    0120-433-0760

    EMAIL: info@webtrackker.com
    Website: www.webtrackker.com

    more information



    Best SAS Training Institute in delhi

    SAS Training in Delhi

    SAS Training center in Delhi

    Best Sap Training Institute in delhi

    Best Sap Training center in delhi


    Sap Training in delhi

    Best Software Testing Training Institute in delhi

    Software Testing Training in delhi

    Software Testing Training center in delhi


    Best Salesforce Training Institute in delhi


    Salesforce Training in delhi

    Salesforce Training center in delhi

    Best Python Training Institute in delhi

    Python Training in delhi

    Best Python Training center in delhi

    Best Android Training Institute In delhi

    Android Training In delhi

    best Android Training center In delhi


    ReplyDelete
  76. The most effective method to Solve MongoDB Map Reduce Memory Issue with Cognegic's MongoDB Technical Support
    Confronting MongoDB outline memory issue? Or then again some other specialized issue in regards to MongoDB? Try not to freeze, simply unwind and take a profound inhale in light of the fact that we at Cognegic give gives MongoDB Online Support or MongoDB Customer Support USA. Here we created powerful database administration applications to utilize the MongoDB. We have professionally experienced and devoted specialized group who constantly prepared to tackle your concern.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  77. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up salesforce Online Training Bangalore

    ReplyDelete
  78. very informative blog and useful article thank you for sharing with us , keep posting
    Big data hadoop online training India

    ReplyDelete
  79. This blog is very useful for me to learn and understand easily. Great and super article.Thanks for sharing this valuable information.Keep sharing.

    zhosters
    Education

    ReplyDelete
  80. Nice information thank you,if you want more information please visit our link Java online training Bangalore

    ReplyDelete
  81. Excellent! You provided very useful information in this article. For more Hadoop training in Hyderabad

    ReplyDelete
  82. Nice information thank you,if you want more information please visit our link Java online training Bangalore

    ReplyDelete
  83. Hi,
    Thanks for sharing such an informative blog. I have read your blog and I gathered some needful information from your post. Keep update your blog. Awaiting for your next update.
    sap abap crm training

    ReplyDelete
  84. It's Amazing! Am very Glad to read your blog. Many Will Get Good Kwnoledge After Reading Your Blog With The Good Stuff. Keep Sharing This Type Of Blogs For Further Uses.
    Hadoop Online Training in Delhi
    Hadoop Online Training in Nodia

    ReplyDelete
  85. Each department of CAD have specific programmes which, while completed could provide you with a recognisable qualification that could assist you get a job in anything design enterprise which you would really like.

    AutoCAD training in Noida

    AutoCAD training institute in Noida


    Best AutoCAD training institute in Noida

    ReplyDelete
  86. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
    Big Data Hadoop Training

    ReplyDelete
  87. 6-week summer course in Noida - 6 weeks The summer course plays a crucial role in shaping the career of young aspiring / informatics students. This training has been specifically introduced so that students can become familiar with current industrial culture and industrial needs. Webtrackker technology offers a 6-month training program for students / graduates that includes small and large projects.
    6-week summer course in Noida

    ReplyDelete
  88. I have really happy to these reading your post. This product control and maintenance of our health.The daily routine can assist you weight lose quickly and safely.My life is completely reworked once I followed this diet.I feeling nice concerning myself.

    Herbalife in Chennai
    Herbalife Nutrition Products
    Nutrition centers in Chennai
    Weight Loss in Chennai
    Weight Gain in Chennai

    ReplyDelete
  89. A career in vehicle CAD (computer aided design) means you get to apply the cutting-edge computer era to help layout and draft automotive components. In this visual and symbol-based verbal exchange approach, you will be capable of tailor your understanding to the automotive area and to the profession within the car area which you want to go into into. This will ensure a protracted-term, high tech profession route that is now not only applicable today, but for the next day as nicely. The following kinds of individuals may be excellent candidates for a career in auto CAD:

    AutoCAD training in Noida

    AutoCAD training institute in Noida


    Best AutoCAD training institute in Noida


    Company address:
    Webtrackker Technology
    C- 67, Sector- 63, Noida
    Phone: 01204330760, 8802820025
    Email: info@webtrackker.com

    ReplyDelete
  90. Amazing!!!! Superb work you have done. I appreciate for your patience and the thought which made you to take this topic to explain. Keep posting, Thank you.
    Big Data Hadoop online training in USA,UK, Singapore
    Hadoop online training in Australia

    ReplyDelete
  91. Cloud Computing Training In Noida
    Webtrackker is IT based company in many countries. Webtrackker will provide you a real time projects based training on Cloud Computing. If you are looking for the Cloud computing training in Noida then you can join the webtrackker technology.
    Cloud Computing Training In Noida , Cloud Computing Training center In Noida , Cloud Computing Training institute In Noida ,

    Company Address:
    Webtrackker Technology
    C- 67, Sector- 63, Noida
    Email: info@webtrackker.com
    Website: www.webtrackker.com
    http://webtrackker.com/Cloud-Computing-Training-Institutes-In-Noida.php

    ReplyDelete
  92. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
    salesforce Online Training Bangalore

    ReplyDelete