Reputation: 157
I wonder if I can use a Tensorflow Dataset for training scikit-learn and other ML frameworks.
So, for example, can I take a tf.data.dataset
for training xgboost, LogisticReg, RandomForest classifier etc?
i.e. Can I pass the tf.data.dataset
object into the .fit()
method of these models, for training?
I tried out:
xs=np.asarray([i for i in range(10000)]).reshape(-1, 1)
ys=np.asarray([int(i%2==0)for i in range(10000)])
xs = tf.data.Dataset.from_tensor_slices(xs)
ys = tf.data.Dataset.from_tensor_slices(ys)
cls.fit(xs, ys)
I'm getting the following error:
TypeError: float() argument must be a string or a number, not 'TensorSliceDataset'
Upvotes: 4
Views: 2456
Reputation: 6667
You can use the as_numpy_iterator()
method; from the docs:
Returns an iterator which converts all elements of the dataset to numpy.
Following your example:
from sklearn.svm import SVC
x = list(xs.as_numpy_iterator())
y = list(ys.as_numpy_iterator())
clf = SVC(gamma='auto')
clf.fit(x, y)
Upvotes: 1