Josh Payne
Josh Payne

Reputation: 373

Creating custom datasets

I imagine this is a broadly applicable question, but I'm trying to create a dataset for a particular competition that involves flying a UAV over a field with cardboard geometric shapes with alphanumeric characters painted on. The objective is to detect and classify the shapes and characters.

Currently, I'm using SURF to detect the shape, K-means to segment the shape and character, and a convolutional neural network to classify each. However, I'm experiencing a bottleneck when it comes to training data that can perform well with real data.

What I've Tried

What I Haven't Tried

Does anyone have any advice on how to create a custom dataset for a task like this?

Upvotes: 3

Views: 1701

Answers (2)

Josh Payne
Josh Payne

Reputation: 373

Something we learned was when generating a custom dataset, one should try to incorporate as many "real" elements (eg handwritten characters from EMNIST, backgrounds from Google Images) as possible. Data augmentation techniques, like using Keras' ImageDataGenerator class, are especially important if a part of the dataset needs to be generated.

We ended up using the EMNIST Balanced dataset and saw good results with this for alphanumeric classification. For localization of the geometric shape, we used the YOLO (https://pjreddie.com/darknet/yolo/) deep learning algorithm instead of SURF. To create a custom dataset, we placed generated geometric shapes on background images of aerial views of fields scraped from Google after placing EMNIST characters onto the geometric shapes.

Upvotes: 0

Sakhri Houssem
Sakhri Houssem

Reputation: 1063

this is my dataSetGenerator maybe help you to generate your own dataset

import numpy as np
from os import listdir
from glob import glob
import cv2

def dataSetGenerator(path,resize=False,resize_to=224,percentage=100):
    """

    DataSetsFolder
      |
      |----------class-1
      |        .   |-------image-1
      |        .   |         .
      |        .   |         .
      |        .   |         .
      |        .   |-------image-n
      |        .
      |-------class-n

    :param path: <path>/DataSetsFolder
    :param resize:
    :param resize_to:
    :param percentage:
    :return: images, labels, classes
    """
    classes = listdir(path)
    image_list = []
    labels = []
    for classe in classes:
        for filename in glob(path+'/'+classe+'/*.tif'):
            if resize:image_list.append(cv2.resize(cv2.imread(filename),(resize_to, resize_to)))
            else:image_list.append(cv2.imread(filename))
            label=np.zeros(len(classes))
            label[classes.index(classe)]=1
            labels.append(label)
    indice = np.random.permutation(len(image_list))[:int(len(image_list)*percentage/100)]
    return np.array([image_list[x] for x in indice]),np.array([labels[x] for x in indice]),np.array(classes)

Upvotes: 4

Related Questions