Reputation: 45
Is there a way to get a specific folder name or file name containing digits? I would like to check if there is a folder name with only digits. If there is a folder with digits name then return the digits. If not then check if the file name has digits and return them else return the file name as default. Example folder:
project
|_62951
|_test1.docx
|_68512
|_test2.docx
|_minor tasks
|_test3.docx
|_Plumbing project
|_69251
|_test4.dox
|_House address
|_69251 plumb.docx
|_test5.docx
Expecting result:
project code: 62951
filename: test1.docx
project code: 68512
filename: test2.docx
project code: 68512
filename: test3.docx
project code: 69251
filename: test4.docx
project code: 69251
filename: 69251 plumb.docx
project code: test5.docx
filename: test5.docx
I have gone through os library and manage to get the filepath and file name, but it comes as a whole filepath and I'm not sure how to break it down. Please share any part of the problem solved. Very much appreciate! Current code:
#run through all folders
def get_files(source):
matches = []
for root, dirnames, filenames in os.walk(source):
for filename in filenames:
matches.append(os.path.join(root, filename))
return matches
def parse(files):
# run through all files
folders = []
for file in files:
filepath,filename = os.path.split(file)
filebreak = [filepath.split("\\")]
print('project code: %s' % filebreak)
print('file name: %s' % filename)
#check file name
path = 'C:\\Users\\quan.nguyen\\***\\***\\Project testing files\\XML'
parse(get_files(path))
Result:
file name: sample_book - Copy.xml
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: sample_book.xml
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: test - Copy.docx
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
file name: test - Copy.pdf
project code: [['C:', 'Users', 'quan.nguyen', '***', '***', 'Project testing files', 'XML', 'folder3']]
The *** are hidden information and is the project code and name
Upvotes: 0
Views: 368
Reputation: 4322
I'd just get the full file list and then filter those that include a number in the file name. In this case to check I'm loading digits
from string module and checking the set intersection.
I don't think using os
module here is a good idea, the code becomes much less readable as a result, pathlib
is generally better for most file system operations.
from pathlib import Path
from string import digits
BASE_FOL = '' # your base project folder
p = Path(BASE_FOL)
files = [f for f in p.rglob('*')
if f.is_file() and set(f.name).intersection(list(digits))]
for f in files:
print(f'Project: {f.parts[0]}\nFilename: {f.name}')
I don't have any similar folder structure to test but it should work.
EDIT: Fixed comprehension - forgot to add the check that all objects we're placing in the list are actual files, not folders.
Upvotes: 1