Reputation: 6485
I'm going to use Microsoft Azure ML for some text analysis purposes such as keyword extraction and as the size of my input is big I want to know whether ML package actually uses the Hadoop (HDP) as its underlying layer or not? If not, how can I use the ML in combination with Hadoop?
Does Mahout have some text analysis tools?
Upvotes: 1
Views: 711
Reputation: 7237
Microsoft Azure ML does not use hadoop. It uses a custom back end that runs each module of an experiment independently(and in parallel when the DAG allows).
Azure ML is not a package, but is a design studio for creating and operationalizing ML solutions.
What is the size of your dataset?
Azure ML currently supports about 6gigs of data for training.
It is recommended you use hdinsight if you need preprocessing of your data. This is also a good place to extract your specific features. Using the feature extraction module on a sample of training data can help determine key columns.
Having to much data is never a bad thing. I recommend down sampling your data to small chucks of maybe about 512-1 gigs. Determine your accuracy with that data size, then scale up 2x or 3x up to 6 gigs and see how much accuracy you gain.
Upvotes: 4