Quan Hoang Nguyen
Quan Hoang Nguyen

Reputation: 45

Get folder name and file name with digits from file path Python

Is there a way to get a specific folder name or file name containing digits? I would like to check if there is a folder name with only digits. If there is a folder with digits name then return the digits. If not then check if the file name has digits and return them else return the file name as default. Example folder:

project
   |_62951
        |_test1.docx
   |_68512
        |_test2.docx
        |_minor tasks
             |_test3.docx
   |_Plumbing project
        |_69251
             |_test4.dox
        |_House address
             |_69251 plumb.docx
             |_test5.docx

Expecting result:

project code: 62951
filename: test1.docx
project code: 68512
filename: test2.docx
project code: 68512
filename: test3.docx
project code: 69251
filename: test4.docx
project code: 69251
filename: 69251 plumb.docx
project code: test5.docx
filename: test5.docx

I have gone through os library and manage to get the filepath and file name, but it comes as a whole filepath and I'm not sure how to break it down. Please share any part of the problem solved. Very much appreciate! Current code:

#run through all folders
def get_files(source):
    matches = []
    for root, dirnames, filenames in os.walk(source):
        for filename in filenames:
                matches.append(os.path.join(root, filename))
    return matches


def parse(files):
    
    # run through all files
    folders = []
    for file in files:
        filepath,filename = os.path.split(file)
        filebreak = [filepath.split("\\")]
        print('project code: %s' % filebreak)
        print('file name: %s' % filename)


        #check file name

path = 'C:\\Users\\quan.nguyen\\***\\***\\Project testing files\\XML'

parse(get_files(path))

Result:

file name: sample_book - Copy.xml
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: sample_book.xml
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: test - Copy.docx
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: test - Copy.pdf
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]

The *** are hidden information and is the project code and name

Upvotes: 0

Views: 368

Answers (1)

NotAName
NotAName

Reputation: 4322

I'd just get the full file list and then filter those that include a number in the file name. In this case to check I'm loading digits from string module and checking the set intersection.

I don't think using os module here is a good idea, the code becomes much less readable as a result, pathlib is generally better for most file system operations.

from pathlib import Path
from string import digits

BASE_FOL = '' # your base project folder
p = Path(BASE_FOL)
files = [f for f in p.rglob('*') 
         if f.is_file() and set(f.name).intersection(list(digits))]

for f in files:
    print(f'Project: {f.parts[0]}\nFilename: {f.name}')

I don't have any similar folder structure to test but it should work.

EDIT: Fixed comprehension - forgot to add the check that all objects we're placing in the list are actual files, not folders.

Upvotes: 1

Related Questions