Reputation: 7752
After building a prototype in R
(using dplyr
), I need to build a model that is deployable to our Java based server-infrastructure. Right now, I'm using the JSAT
-machine-learning library.
What is the best way to wrangle data?
None of the collection-like types from the JSAT
package (ClassificationDataSet
, RegressionDataSet
, DataSet
) seem to support even basic tasks like:
Upvotes: 1
Views: 501
Reputation: 6514
1) This isn't currently supported in JSAT, JSAT is a source of Machine Learning algorithms. Dataframe like operations are not a goal of the project in any way. I'm not sure why you would want to be filtering out data in a production system, there is no reason you couldn't do that in a better tool and then export the data to have JSAT build the model.
2) All DataSet objects inherit a randomSplit
method that can do what you have asked for. An example of that is here.
3) See 1, I'm not sure what the use case is for adding "new rows based on the values of other rows". All the different DataSet classes support adding new data points, you just have to create them yourself.
source: I'm the author of JSAT
Upvotes: 1