Abin John Thomas
Abin John Thomas

Reputation: 169

Explain dataset in scikit

I am using scikit learn to understand machine learning. An introduction to machine learning with scikit-learn

Here the data is loaded into variable digits. digits.data gives us access to the data which is an 8 * 8 matrix. My question is what does the values in digits.data refers to, and why is the maximum value restricted to 16.

My best guess is its the gray scale value of each pixel, if so what is the difference between digits.data and digits.image

Thanks

Upvotes: 0

Views: 67

Answers (1)

Roy Jevnisek
Roy Jevnisek

Reputation: 320

digits.image holds the raw images. digits.data hold the features (which in this case is simply the raw image, as you progress with the tutorial this will change to more sophisticated features). digits.data is shaped differently, in a way more natural to learning, where each row corresponds to a single image. Hence if you try:

 import numpy as np
 import matplotlib.pyplot as plt 
 plt.imshow(digits.images[0], cmap="gray") 

and:

 plt.imshow(np.reshape(digits.data[0, :], (8, 8)), cmap="gray")

you will get the same result.

Upvotes: 1

Related Questions