Eugene Fitzher
Eugene Fitzher

Reputation: 163

python Deep learning load image dataset

I'm new to deep learning. I feel so stupid myself, because I'm asking many questions during one day.

I'm loading images through glob, and I'm heading a problem.

I planned to load images to numpy, and I hope it's shape: width * height * imagecounts (only 1 shape), but it is not. I wants 128*128*242, and the result is 128, 128, 3( and I don't know where the '3' come from.)

I think, one of the problem is I need to load whole images, but every time I renew the data into new image, and not calculating the counts.

I'm learning about 'deep learning' to make 'deep learning', and I really want to make it on my own later, please help.

Data example:

[[255 255 255]
 [255 255 255]
 [255 255 255]
      ...
 [255 255 255]
 [255 255 255]
 [255 255 255]]

Here's my code

def _load_img():
# I get data_list with
# data_list = glob('dataset\\training\\*\\*.jpg')
    for v in data_list: 
        data = np.array(Image.open(v))

#img_size : 128 * 128
#I reshaped the data to get 242 the number of images count.
    data = data.reshape(-1, img_size)

    return data

Upvotes: 0

Views: 3478

Answers (1)

wookiekim
wookiekim

Reputation: 1176

The images are typically in RGB format, resulting in 3 channels, and thus 128*128* 3 * 242.

Since you do not want to lose your RGB information, you could use:

data = data.reshape(-1, img_size * 3)

This is why conventional image-processing architectures have their input channel set to 3, considering RGB. Black and white datasets such as MNIST only have one channel, though.

VGGNet architecture

Note the 224 x 224 x 3 in the beginning.

Image size: 224 X 224
Channel size (RGB) : 3

[EDIT]

for v in data_list: 
        data = np.array(Image.open(v))

You are not adding your newly created array anywhere. Maybe create a list to handle all your images:

data = [np.array(Image.open(v)) for v in data_list]

However, loading in all your images before processing them is quite inefficient, so loading your image one by one WHEN they are needed is a good practice. But this should fix your issue at hand?

Upvotes: 2

Related Questions