Reputation: 1208
This question is similar to this Converting R dataframe to H2O Frame without writing to disk except applicable to Java object
My data is generated in Java application, then saved as text and passed to H2O (through R or Flow). I guess I can avoid some overhead if I create (and save) H2O DataFrames on the fly inside my application. I suspect it's pretty straightforward, but quick look at the docs didn't give an easy (SO-style) answer
Upvotes: 1
Views: 487
Reputation: 21
I spent ages trying to work out the same thing, as this isn't documented anywhere That is, how to create an H2O frame without (slowly) round-tripping to disk.
Here's my in-memory java solution for converting a Guava table to an H2O dataframe. Should be trivial to adapt to your data structure.
I haven't yet worked out how to split this into chunks though (or what the performance implications of not doing so are). Maybe someone who knows H2O better can comment on this...?
/**
* Converts a Guava Table to an H2O Frame (and registers it in the Distributed
* Key/Value store) using only in-memory operations.
*
* TODO everything is contained in a single chunk. Not sure of performance implications of this...
*
* @param t
* the guava table
* @param tableKey
* a unique key to register the table in h2o under
* @return an H2O Frame
* @throws IOException
*/
public static Frame tableToFrame(LinkedHashBasedTable<Integer, String, Double> t, String tableKey) throws IOException {
Set<String> cols = t.columnKeySet();
List<Vec> vecs = new ArrayList<>();
VectorGroup group = new VectorGroup(); //make a common group to contain vector keys. This has something to do with Chunk distribution among nodes for parallel processing.
for (String col : cols) {
double[] vals = toDoubleArray(t.column(col).values());
Key<Vec> key = group.addVec();
Vec v = Vec.makeVec(vals, key);
vecs.add(v);
}
String[] names = cols.toArray(new String[cols.size()]);
Vec[] vecArr = vecs.toArray(new Vec[vecs.size()]);
Key<Frame> frameKey = Key.make(tableKey);
Frame frame = new Frame(frameKey, names, vecArr);
DKV.put(frameKey, frame); //register the new Frame in the DKV, so h2o jobs can find it.
logger.info("Converted Table to Frame with "+frame.numRows()+" rows and "+frame.numCols()+" cols");
return frame;
}
Upvotes: 2
Reputation: 510
You can use the REST API to drive your H2O instance from your Java app but this will still require you to save your data and then call something like ImportFiles (http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/2/docs-website/h2o-docs/index.html#route-%2F3%2FImportFiles).
If I understood your question correctly, there is currently no way to 'stream' your data from your app to the H2O instance the way you're suggesting. You could do this if you bundle H2O directly into your app but I'm not sure that's what you want.
Upvotes: 1