tejasv agarwal
tejasv agarwal

Reputation: 13

Python: program to sort files according to entries in csv file

import os, unicodecsv as csv
# open and store the csv file
IDs = {}
with open('labels.csv','rb') as csvfile:
    timeReader = csv.reader(csvfile, delimiter = ',')
    # build dictionary with associated IDs
    for row in timeReader:
        IDs[row[0]] = row[1]
# move files
path = 'train/'
tmpPath = 'train2/'
for oldname in os.listdir(path):
    # ignore files in path which aren't in the csv file
    if oldname in IDs:
        try:
            os.rename(os.path.join(path, oldname), os.path.join(tmpPath, IDs[oldname]))
        except:
            print 'File ' + oldname + ' could not be renamed to ' + IDs[oldname] + '!'

I am trying to sort my files according to this csv file. But the file contains many ids with same name. Is there a way to move files with same name to 1 folder or adding a number in front of a file if the file with same name already exist in directory?

Example-

id                   name
001232131hja1.jpg     golden_retreiver
0121221122ld.jpg      black_hound
0232113222kl.jpg      golden_retreiver
0213113jjdsh.jpg      alsetian
05hkhdsk1233a.jpg     black_hound

I actually want to move all the files having id corresponding to golden_retreiver to one folder and so on.

Upvotes: 1

Views: 1114

Answers (1)

Hai Vu
Hai Vu

Reputation: 40783

Based on what you describe, here is my approach:

import csv
import os

SOURCE_ROOT = 'train'
DEST_ROOT = 'train2'

with open('labels.csv') as infile:
    next(infile)  # Skip the header row
    reader = csv.reader(infile)
    seen = set()
    for dogid, breed in reader:
        # Create a new directory if needed
        if breed not in seen:
            os.mkdir(os.path.join(DEST_ROOT, breed))
            seen.add(breed)

        src = os.path.join(SOURCE_ROOT, dogid + '.jpg')
        dest = os.path.join(DEST_ROOT, breed, dogid + '.jpg')

        try:
            os.rename(src, dest)
        except WindowsError as e:
            print e

Notes

  • For every line in the data file, I create the breed directory at the destination. I use the set seen to make sure that I only create each directory once.
  • After that, it is a trivia matter of moving files into place
  • One possible move error: file does not exist in the source dir. In which case, the code just prints out the error and ignore it.

Upvotes: 1

Related Questions