Reputation: 31
I'm currently using sklearn to build a simple image recogniser.
I need to use load_files('./directory/') to load images from sub-folders within that directory.
It correctly gets the target values but the data attributes are not simple pixel values. I assume I need to set the encoding parameter to consider the image files but can't find what exactly to use.
Upvotes: 3
Views: 5412
Reputation: 40169
The encoding parameter is used to decode the raw bytes of the content of the files assuming a text encoding (e.g. UTF-8).
For image files you will need to iterate the content of the filenames
attribute yourself and use something like scipy.misc.imread (you will also need to install PIL or Pillow package).
Here is a utility function to load the data of jpeg files from the Labeled Faces in the Wild as numpy arrays:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/lfw.py#L108
You can use it to understand how to write your own custom dataset loader.
Upvotes: 4