tiktok
tiktok

Reputation: 157

How to train sklearn models using tensorflow dataset?

I wonder if I can use a Tensorflow Dataset for training scikit-learn and other ML frameworks.

So, for example, can I take a tf.data.dataset for training xgboost, LogisticReg, RandomForest classifier etc? i.e. Can I pass the tf.data.dataset object into the .fit() method of these models, for training?

I tried out:

    xs=np.asarray([i for i in range(10000)]).reshape(-1, 1)
    ys=np.asarray([int(i%2==0)for i in range(10000)])
    
    xs = tf.data.Dataset.from_tensor_slices(xs)
    ys = tf.data.Dataset.from_tensor_slices(ys)
    cls.fit(xs, ys)

I'm getting the following error:

    TypeError: float() argument must be a string or a number, not 'TensorSliceDataset'

Upvotes: 4

Views: 2456

Answers (1)

Miguel Trejo
Miguel Trejo

Reputation: 6667

You can use the as_numpy_iterator() method; from the docs:

Returns an iterator which converts all elements of the dataset to numpy.

Following your example:

from sklearn.svm import SVC

x = list(xs.as_numpy_iterator())
y = list(ys.as_numpy_iterator())

clf = SVC(gamma='auto')

clf.fit(x, y)

Upvotes: 1

Related Questions