pankaj mishra
pankaj mishra

Reputation: 2615

Finding duplicate folders and renaming them by prefixing parent folder name in python

I have a folder structure as shown below

Folder_structure

There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.

e.g. DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder. eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt

What I have done till now ?

I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.

import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
    for file in files:
        if file.endswith(".txt"):
            path_file = os.path.join(root)
            print(path_file)
            list_dir.append(path_file)

Upvotes: 0

Views: 216

Answers (1)

shriakhilc
shriakhilc

Reputation: 3000

The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.

import os

cwd = os.getcwd()

to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
    if to_be_renamed == set():
        to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
    else:
        to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])

for rootdir in next(os.walk(cwd))[1]:
    subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
    for s in subdirs:
        if s in to_be_renamed:
            srcpath = os.path.join(cwd, rootdir, s)
            dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
            # First rename files
            for f in next(os.walk(srcpath))[2]:
                os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
            # Now rename dir
            os.rename(srcpath, dstpath)
            print('Renamed', s, 'and files')

Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).

Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.

os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.

So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.

If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.

NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.

Upvotes: 1

Related Questions