Reputation: 1694
Hi I tried to apply PCA on a folder with many pics inside (.jpg). However, I stuck on converting it to the format that scikit-learn PCA accepts. It seems that PCA takes array data format. I read articles like PCA for image data but it looks quite complicated for me. I just want to convert images to accepted format then use pca.fit
Before I used os.walk to change images to gray scales and resize them (as below). I was wondering if I can use it on PCA as well.
from sklearn.decomposition import PCA
from PIL import Image
import os
import numpy as np
WORK_DIR = 'D:/folder/' #working folder
source = os.path.join(WORK_DIR, 'train')
target = os.path.join(WORK_DIR, 'gray')
for root, dirpath, filenames in os.walk(source):
for file in filenames:
image_file = Image.open(os.path.join(root, file))
image_file.draft('L', (256, 128))
image_file.save(os.path.join(target, file))
Any other easier methods will be great too.
Upvotes: 2
Views: 1733
Reputation: 16966
After reading the image data, it would be a 2D array. You have to flatten it out, .flatten()
would do that. Now you can use this data for pca.fit()
.
from sklearn.decomposition import PCA
from PIL import Image
import os
import numpy as np
WORK_DIR = 'D:/folder/' #working folder
source = os.path.join(WORK_DIR, 'train')
target = os.path.join(WORK_DIR, 'gray')
train_data=[]
for root, dirpath, filenames in os.walk(source):
for file in filenames:
image_file = os.path.join(root, file)
print(image_file)
train_data.append(np.array(Image.open(image_file,'r')).flatten())
pca=PCA()
pca.fit(train_data)
Upvotes: 1