Reputation: 3183
I am new to Mahout. I want to install it and try it out. So far I have Maven3 and Java 1.6 installed and configured on my Mac. My question is:
Do I have to install Hadoop firstly before installing Mahout?
Some tutorials include installing Hadoop and some not which confuse me. I know Mahout is built on top of Hadoop. But not all of Mahout depends on Hadoop.
Can someone provide some useful detailed resources about installation?
Upvotes: 2
Views: 4266
Reputation: 300
Giving another answer to this question now that it's two years later and I finally got an itemsimilarity command to run on a mac after a lot of cursing and some blood spilled... Hope this saves someone some time and misery. Except my coworkers! Your weakness disgusts me! Anyway...
First for the "do I need $FINICKY_BIG_DATA_PLATFORM" question, see:
http://mahout.apache.org/users/basics/algorithms.html
Hadoop and/or spark are not hard requirements, some algorithms run on a single machine. But, the algorithm you may be interested in may only run on hadoop and/or spark. The docs on recommendations also steer you pretty strongly toward running the spark based algorithms. They also encourage you to use the black box command line commands, which can have different arguments between the single machine and spark versions (itemsimilarity, for example). So you don't NEED it, but you'll probably still need it.
I tried brew installs of hadoop, apache-spark and mahout. If you use the absolute latest versions (mahout 0.11.0, apache-spark 1.4.1, hadoop 2.7.1), you may have some of these problems:
" Got error Cannot find Spark class path. Is 'SPARK_HOME' set? " To fix this, not only do you need to have that environment variable set (mine is set to "/usr/local/Cellar/apache-spark/1.4.1/libexec"), you also need the apparently now deprecated compute-classpath.sh script in ${SPARK_HOME}/bin/ . I had a 1.2.0 spark installation handy, so I lifted one from there.
Bonus gotcha, in that 1.2.0 install there are two compute-classpath.sh scripts, one is just a one-liner invoking the other. You will probably be happier if you copy over the "real" one, so use less to check.
" java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path " To fix this, the Internet will tell you to get a copy of libsnappyjava.jnilib , put it in /usr/lib/java and rename it libsnappyjava.dylib . I did "brew install snappy," which installed version 1.1.3 and included symlinks named libsnappy.dylib and libsnappy.jnilib. Note that these are just symlinks and that the names aren't quite right... So after copying and renaming the main lib file I at least got a new error, which brings us to...
" Exception in thread "main" java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I " The Internet was less forthcoming with suggestions. I did see one post saying that version 1.0.xxx didn't have whatever magic pony code but version 1.1.1.3 did. I went to http://central.maven.org/maven2/org/xerial/snappy/snappy-java/ , downloaded snappy-java-1.1.1.3.jar and dropped that as-is into /usr/lib/java , no name changes. This made the snappy errors go away and I could run a "mahout spark-itemsimilarity" command to completion, YMMV, this advice is provided as-is with no warranty.
Please note that snappy error induced despair may drive you to download the spark .tgz and build it from scratch. The build process will take up ~2 hours of your life that you will never get back and you will still get snappy errors at the end. Ultimately I could run the same command with this hand-built version as with the brew installed version, the snappy jar ended up being the main thing.
Upvotes: 1
Reputation: 7938
You don't need hadoop at all to try out mahout. Below is a sample code which take model as input from a file and will print recommendations.
package com.ml.recommend;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
public class App {
public static void main(String[] args) throws IOException, TasteException {
DataModel model = new FileDataModel(new File("data.txt"));
UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(3,
userSimilarity, model);
Recommender recommender = new GenericUserBasedRecommender(model,
neighborhood, userSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
List<RecommendedItem> recommendations = cachingRecommender.recommend(
1000000000000006075L, 10);
System.out.println(recommendations);
}
}
Upvotes: 0
Reputation: 777
these 2 links helped me get up and running on OSX. It's not strictly necessary to use hadoop with mahout, however almost certainly it would be useful to gain experience with both as you go, if you are planning to use in a scalable system ...
Upvotes: 1