Asco 2
Asco 2

Reputation: 11

Why does the following code not successfully move desired files to the desired directories?

import os
import shutil
from collections import Counter as cnt


curr_path = os.getcwd()

ls = lambda path: [x for x in os.listdir(path) if x[0] != '.']

raw_files = [x for x in ls(curr_path) if '_' not in x]

filenames = []

folders = []

for f in raw_files: # sweeping out the '.', only having filenames

    try:
        i = f.index('.')
        f_new = f[0:i]

        filenames.append(f_new)
    except ValueError:
        filenames.append(f)

fname_freq = cnt(filenames)

for fname, freq in fname_freq.items():
    if freq > 1:
        folders.append(fname)


for fldr in folders:
    print(fldr+'\n')
    try:
        os.makedirs(fldr+'_')
        # adding a '_' to make the true folder name, i.e. foldername_ instead of foldername
    except OSError:
        pass

for f in raw_files:
    print("File being analyzed is: {fn} \n".format(fn=f))
    for fldr in folders:
        print("Folder to move stuff to is {fold}\n".format(fold=fldr))
        print("File {f}  being checked . . . ".format(f=f))


        if fldr in f and f[-1] != '_': 
            print("\t Moving file {fn} \n".format(fn=f))
            shutil.move(f, fldr+'_')



The above is a program FileGrouper.py, which will look inside a directory for whether there are multiple files with a shared name (i.e. random.java, random.class). If there are, then they will be moved to a directory named [insert shared name]_ to organize them.

This code in particular refuses to work for a specific set of files.

Note: I replaced my actual Username, as well as the Timestamps with placeholders

-rwxr-xr-x  1 UserName  staff   8480 Month 20 Time:Time another_ptr_func
-rw-r--r--@ 1 UserName  staff    324 Month 20 Time:Time another_ptr_func.c
-rwxr-xr-x  1 UserName  staff   8572 Month 20 Time:Time arrays1
-rw-r--r--  1 UserName  staff    321 Month 20 Time:Time arrays1.c
-rw-r--r--@ 1 UserName  staff   2058 Month 20 Time:Time file_grouper.py
-rwxr-xr-x  1 UserName  staff   8432 Month 20 Time:Time forloop
-rw-r--r--  1 UserName  staff    119 Month 20 Time:Time forloop.c
-rwxr-xr-x  1 UserName  staff   4248 Month 20 Time:Time gen_a
-rw-r--r--  1 UserName  staff     53 Month 20 Time:Time gen_a.c
-rwxr-xr-x  1 UserName  staff   8432 Month 20 Time:Time hello
-rw-r--r--  1 UserName  staff     65 Month 20 Time:Time hello.c
-rwxr-xr-x  1 UserName  staff   8432 Month 20 Time:Time hello2
-rw-r--r--  1 UserName  staff    140 Month 20 Time:Time hello2.c
-rw-r--r--@ 1 UserName  staff    343 Month 20 Time:Time pointer.c
-rwxr-xr-x  1 UserName  staff   8472 Month 20 Time:Time pointer_func
-rw-r--r--  1 UserName  staff    352 Month 20 Time:Time pointer_func.c
-rwxr-xr-x@ 1 UserName  staff  19340 Month 20 Time:Time pointer_hex
-rwxr-xr-x  1 UserName  staff   8432 Month 20 Time:Time switch
-rw-r--r--@ 1 UserName  staff    219 Month 20 Time:Time switch.c
-rwxr-xr-x  1 UserName  staff   8480 Month 20 Time:Time void_another_ptr_func
-rw-r--r--  1 UserName  staff    335 Month 20 Time:Time void_another_ptr_func.c

The above is the specific set of files that this code refuses to work for. I have tested this same code on the following, consisting of dummy files with zero size (used touch to create them):

-rw-r--r--  1 UserName  staff  1183 Month 20 Time:Time README.md
-rw-r--r--@ 1 UserName  staff  2067 Month 20 Time:Time file_grouper.py
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random1
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random1.txt
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random2
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random2.txt
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random3
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random3.txt
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random4
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random4.txt
-rw-r--r--  1 UserName  staff     0 Month 20 Time:Time random5.txt

The program worked successfully on these files, and generated the following:

.
├── README.md
├── file_grouper.py
├── random1_
│   ├── random1
│   └── random1.txt
├── random2_
│   ├── random2
│   └── random2.txt
├── random3_
│   ├── random3
│   └── random3.txt
├── random4_
│   ├── random4
│   └── random4.txt
└── random5.txt

As you can see, the program successfully ignored the lone random5.txt, as no other files with a shared name existed. However, the files where others with a shared name did exist were successfully grouped into a folder of [insert shared name]_

Upvotes: 1

Views: 305

Answers (1)

Simon
Simon

Reputation: 5698

If by "refusing to work" you mean certain files are skipped, then that could be because of this line:

raw_files = [x for x in ls(curr_path) if '_' not in x]

All files with an underscore are removed.

Another issue is that file systems will not allow files and directories with the same name, which you cover by appending an underscore.

I'd strongly suggest to use the pathlib library, it makes your code more maintainable and readable:

from pathlib import Path


def file_grouper():
    # Path to group
    path = Path('.')

    # List all files
    ls = path.glob('*')

    # Map all stems (file names without extension) to their file names
    names = {}
    for x in ls:
        if x.is_file():
            if x.stem not in names:
                names[x.stem] = []
            names[x.stem].append(x)

    # Create and move
    for stem, values in names.items():
        if len(values) > 1:
            (path / (stem + '_')).mkdir(exist_ok=True)
            for value in values:
                value.rename(path / (stem + '_') / value.name)


file_grouper()

Upvotes: 1

Related Questions