Reputation: 11
I'm using this data http://weegee.vision.ucmerced.edu/datasets/landuse.html on a Google Colab.
Trying to load the images as a dataframe using:
# Download and unzip images
!wget http://weegee.vision.ucmerced.edu/datasets/UCMerced_LandUse.zip
!unzip UCMerced_LandUse.zip
print("DONE!")
filename = '//content/UCMerced_LandUse/Images'
data = tf.keras.preprocessing.image_dataset_from_directory(filename)
I get the following error:
TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string.
It founds the folders with the labels, but it misses all one hundred images inside each label folder. The image format is '.tif'.
In the image you can observe the directory structure.
The function:
tf.keras.preprocessing.image.load_img('/content/UCMerced_LandUse/Images/agricultural/agricultural00.tif', grayscale=False, color_mode="rgb", target_size=None, interpolation="nearest")
Works well and show the image.
I have tried all found in the following post: TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string Like rename the files:
all_dir = glob.glob('UCMerced_LandUse/Images/*/')
for dir in sort(all_dir):
name = dir.split('/')[2]
for img, n in zip(sort(glob.glob(dir + '*.tif')), range(0,100)):
new_file = os.path.join(dir, '{}{}.tif'.format(name, n))
os.rename(img, str(new_file))
And some other "solutions" proposed in internet. But no ones fix the problem.
If you have some clues from where I'm failing, it would be appreciated.
Thanks.
Upvotes: 0
Views: 1182
Reputation: 11
After looking over the problem I realized:
image_dataset_from_directory function do not accept .tif format as shows on documentation:
Supported image formats: jpeg, png, bmp, gif. Animated gifs are truncated to the first frame.
Instead of image_dataset_from_directory I have to use flow_from_directory method and ImageDataGenerator class.
First I generate code to split the images in train and validation folders:
import shutil
import random
validaton = 33
for dir in all_dir:
validation = []
train = []
validation = random.sample(list(sort(glob.glob(dir + '*.tif'))), validaton)
for i_val in glob.glob(dir + '*.tif'):
if i_val not in validation:
train.append(i_val)
label_name = dir.split('/')[-2]
if not os.path.exists('/content/UCM_cust/validation'+'/'+label_name) &
os.path.exists('/content/UCM_cust/train'+'/'+label_name):
os.mkdir('/content/UCM_cust/validation'+'/'+label_name)
os.mkdir('/content/UCM_cust/train'+'/'+label_name)
for z in validation:
dest_name = None
dest_name = z.split('/')[-1]
shutil.copyfile(z,'/content/UCM_cust/validation/'+label_name+'/'+dest_name)
for t in train:
dest_name = None
dest_name = t.split('/')[-1]
shutil.copyfile(t, '/content/UCM_cust/train/'+label_name+'/'+dest_name)
Then I follow a tutorial from blog.keras, load the data using the following code
batch_size = 32
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
train_generator = train_datagen.flow_from_directory(
'/content/UCM_cust/train', # this is the target directory
target_size=(256, 256),
batch_size=batch_size,
class_mode='categorical')
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
'/content/UCM_cust/validation',
target_size=(256, 256),
batch_size=batch_size,
class_mode='categorical')
Train: Found 1407 images belonging to 21 classes.
Validation: Found 693 images belonging to 21 classes.
I hope this can be useful for some other people
Upvotes: 1