H2O modelling: stand-alone k-means or regression code

Question

I am very new to H2O and to running models on hives. The reason I am considering H2O at this moment is that my understanding is that H2O helps to optimize data parsing during various modelling processes (such as k-means or logistic regression). My question is: is there a way for me to write my python (or R) k-means code and run it in H2O or the only way would be using the H2O pre-built process? If it's the later, then can I extract the final scoring code in order to schedule it for automated run for a regular scoring? And if the first option is also possible ( I noticed the option 'import the code"), how would the parsing happen during the process (for instance, during data preparation, variable standardization, actual k-means scoring code, assigning the final segment rules)?

Thank you

Natalie

Erin LeDell · Accepted Answer

You will need to use the H2OKMeansEstimator method in Python (or h2o.kmeans() function in R) -- there is no way to "import" your own K-means code into H2O, which I think is what you're asking. There is more info about H2O's K-means implementation here.

To export an H2O K-means model for use (scoring) in a production environment, you can export the model as a POJO or MOJO (pure Java code with no dependencies), or you can save a binary H2O model which will require to have an H2O cluster running at scoring-time.

H2O modelling: stand-alone k-means or regression code

Answers (1)

Related Questions