Nothing More
Nothing More

Reputation: 933

Are there similar datasets to MNIST?

I am doing research on machine learning. Now I want to test my algorithms with some famous datasets. Since I am a newbie in this area, I can't find other suitable datasets apart from MNIST. I thing MNIST is quite suitable for our research. Does anyone know some similar datasets with MNIST?

P.S I know another handwritten digit dataset that is often used, called USPS dataset. But I need a dataset with more training examples (typically more than 10000 and comparable to the number of training examples in MNIST), so USPS is out of my selection.

Upvotes: 9

Views: 12520

Answers (3)

Joy
Joy

Reputation: 97

I know this question is old, but I hope my suggestions can still be useful. I was also looking for datasets similar to handwritten MNIST and Fashion MINIST as well. Pytorch provides several of them with documentation: KMNIST, QMNIST, USPS, SEMEION, SVHN, amongst others. Check here for the full list.

Upvotes: 1

aliakbars
aliakbars

Reputation: 61

You can try Fashion MNIST or Kuzushiji MNIST that have very similar properties to MNIST, but a bit harder to predict. From Fashion MNIST's page:

Seriously, we are talking about replacing MNIST. Here are some good reasons:

  • MNIST is too easy. Convolutional nets can achieve 99.7% on MNIST. Classic machine learning algorithms can also achieve 97% easily. Check out our side-by-side benchmark for Fashion-MNIST vs. MNIST, and read "Most pairs of MNIST digits can be distinguished pretty well by just one pixel."
  • MNIST is overused. In this April 2017 Twitter thread, Google Brain research scientist and deep learning expert Ian Goodfellow calls for people to move away from MNIST.
  • MNIST can not represent modern CV tasks, as noted in this April 2017 Twitter thread, deep learning expert/Keras author François Chollet.

Upvotes: 5

corrin
corrin

Reputation: 63

The machine learning archive (http://archive.ics.uci.edu/ml/) contains quite a variety of datasets including those, like MINIST, suitable for classification e.g. (http://archive.ics.uci.edu/ml/datasets/Skin+Segmentation).

I can't say which of them would be suitable without knowing what you're trying to demonstrate with your algorithm but anything inside the UCI archive is well known.

Upvotes: 5

Related Questions