balto
balto

Reputation: 137

How to build and execute examples in Mahout in Action

I am learning Mahout in Action now and writing to ask how to build and execute examples in the book. I can find instructions with eclipse, but my environment doesn't include UI. So I copied the first example (RecommenderIntro) to RecommenderIntro.java and compile it through javac.

I got an error because the package was not imported. So I am looking for :

  1. Approaches to import missing packages.

  2. I guess, even it compiles successfully, .class file will be generated, how can I execute it? through "java RecommnderIntro"? I can execute mahout examples through sudo -u hdfs hadoop jar mahout-examples-0.7-cdh4.2.0-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job, how can I do something similar for my own example?

  3. All my data is saved in HBase tables, but in the book (and even google), I cannot find a way to integrate it with HBase, any suggestions?

Upvotes: 0

Views: 783

Answers (1)

BKersbergen
BKersbergen

Reputation: 31

q1 and q2, you need a java build tool like maven. You build the hadoop-jar with : 'mvn clean install' This creates your hadoop job in target/mia-job.jar You then execute your job with: hadoop jar target/mia-job.jar RecommenderIntro inputDirIgnored outputDirIgnored (The RecommenderIntro ignores parameters, but hadoop forces you to specify at least 2 parameters usually the input and output dir )

q3: You can't out-of-the-box. Option1: export your hbase data to a text file 'intro.csv' with content like: "%userId%, %ItemId%, %score%" as described in the book. Because that's the file the RecommenderIntro is looking for. Option2: Modify the example code to read data from hbase...

ps1. for developing such an application I'd really advise using an IDE. Because it allows you to use code-completion, execute, build, etc. A simple way to get started is to download a virtual image with hadoop like Cloudera or HortonWorks and install an IDE like eclipse. You can also configure these images to use your hadoop cluster, but you dont need to for small data sets. ps2. The RecommenderIntro code isn't a distributed implementation and thus can't run on large datasets. It also runs locally instead of on a hadoop cluster.

Upvotes: 2

Related Questions