Reputation: 21947

Python API to load various machine-learning datasets?

Does anyone have a Python API to get various ML datasets, along the lines

X, Y, info = mldata.load( name, db=, verbose= )
X: N x dim data, a NumPy array
Y: N, ints for class numbers or None
info: a dict with ...

I'd prefer straight python with NumPy, but if an Rpy function could just get data, that might be ok (sorry, don't speak much R).

For a "db", a flat file would be fine, like

#! http://archive.ics.uci.edu/ml/machine-learning-databases
# ncol  nrow  nclass  year  name               etc.
  3  2858  2  2008   "Character+Trajectories"  Time-Series     Classification, Clus
  4   150  2  1988   "Iris"    Multivariate    Classification  Real
  8   768  2  1990   "Pima+Indians+Diabetes"   Multivariate    Classification  Inte
...

Why just flat files instead of "real" dbs ? Because I can download them once, then browse, sort, awk them with near-0 effort; others may prefer a fancy search engine.

Whether data is stored locally or loaded over the web is for me a dont-care. (Do both, env MLDATAPATH = ( local dir ... url ... ) )?

(A basic API oughta be trivial for sites with uniform names and uniform data, but uniformizing e.g. uci/ml looks like quite a lot of dull work.)

Upvotes: 2

Answers (2)

Tirtha

Reputation: 738

You can check this package/code base for searching and importing any UCI ML repo data set. It will not load the data set in a Python object but just automatically search and download your choice of dataset from the portal. You can even choose all datasets of certain size and ML task category.

https://github.com/tirthajyoti/UCI-ML-API

Upvotes: 0

Yannick Versley

Reputation: 780

The folks from Scikits.learn solved that problem in the Scikits.learn examples

Datasets come in all shapes and sizes, though, so they do have custom code for dealing with each dataset. (It would be different if you only had, say, CSV or ARFF format datasets and not also grayscale images and whatnot).

Upvotes: 1

Python API to load various machine-learning datasets?

Answers (2)

Related Questions