Abdulaziz G
Abdulaziz G

Reputation: 13

Is there any way to match the JSON file with the dataset (images) in python

I'm working on a machine learning (IMAGE CLASSIFICATION) and I found a data set that has two files:

  1. The images (20,000 images) "The images "The images are numbered from 1 to 20,000 (not classified into classes)"
  2. A JSON file that has the information and classification of the images (12 classes of images) The JSON file is structured as follows:
{
  "<image_number>": {
    "image_filepath": "images/<image_number>.jpg", 
    "anomaly_class": "<class_name>"
  },
  ...
}

So I'm trying to read the JSON file and split the data set so I can deal with each class individually.. Then take 80% of "each class" as a training set and 20% for the testing set

I tried to find a way to match the JSON file with the dataset (images) So I can classify the classes in individual folders then divide them into training and testing sets

Anyone can help me with that?

THANK YOU

Upvotes: 0

Views: 1588

Answers (2)

jhylands
jhylands

Reputation: 1014

Something like the following would create folders for each of the classes and then move the images into them.

import json
import os
from os import path
# Open the json file containing the classifications
with open("clasification.json", "r") as f:
   classification = json.load(f)
# Create a set which contains all the classes
classes = set([i["anomaly_class"] for i in classification.values()])
# For each of the classes make a folder to contain them
for c in classes:
    os.makedirs(c)
# For each image entry in the json move the image to the folder named it's class
for image_number, image_data in classification.items():
    os.rename(image_data["image_filepath"], path.join(image_data["anomaly_class"], "{}.jpg".format(image_number)))

Upvotes: 0

Miguel Alorda
Miguel Alorda

Reputation: 692

Something like this should work:

import json
from pathlib import Path

currDir = Path(__file__).resolve().parent
# Path where the images will be moved to
imagesDir = currDir / 'images'
testingDir = imagesDir / 'testing'
trainingDir = imagesDir / 'training'

# Load data
infoPerImage = {}
# This has to be the path to the file containing the data
# I assumed it is in the current directory
infoFilePath = currDir / 'data.json'
with infoFilePath.open() as f:
    infoPerImage = json.loads(f.read())

# Separate into classes
infoPerClass = {}
for imageNumber, imageInfo in infoPerImage.items():
    imageClass = imageInfo['anomaly_class']
    imagePath = imageInfo['image_filepath']
    currentClassImages = infoPerClass.setdefault(imageClass, [])
    currentClassImages.append(imagePath)

# Create directories for the classes
for imageClass in infoPerClass:
    pathToImageClassTraining = trainingDir / imageClass
    pathToImageClassTraining.mkdir(parents=True)
    pathToImageClassTesting = testingDir / imageClass
    pathToImageClassTesting.mkdir(parents=True)

# Separate into training and testing images
trainingImages = {}
testingImages = {}
for imageClass, imagePaths in infoPerClass.items():
    lenImagePaths = len(imagePaths)
    upperLimit = int(lenImagePaths * 0.8)
    trainingImages[imageClass] = imagePaths[:upperLimit]
    testingImages[imageClass] = imagePaths[upperLimit:]

def moveImagesToTheirDir(imagesDict, imagesBasePath):
    for imageClass, imagePaths in imagesDict.items():
        for imagePath in imagePaths:
            imageSrc = Path(imagePath)
            imageDest = imagesBasePath / imageClass / imageSrc.name
            imageSrc.rename(imageDest)
moveImagesToTheirDir(trainingImages, trainingDir)
moveImagesToTheirDir(testingImages, testingDir)

Upvotes: 0

Related Questions