Rose
Rose

Reputation: 205

Python: how to search for specific "string" in directory name (not individual file names)

I want to create a list of all the filepath names that match a specific string e.g. "04_DEM" so I can do further processing on the files inside those directories?

e.g.

INPUT

 C:\directory\NewZealand\04DEM\DEM_CD23_1232.tif
 C:\directory\Australia\04DEM\DEM_CD23_1233.tif
 C:\directory\NewZealand\05DSM\DSM_CD23_1232.tif
 C:\directory\Australia\05DSM\DSM_CD23_1232.tif

WANTED OUTPUT

 C:\directory\NewZealand\04DEM\
 C:\directory\Australia\04DEM\

This makes sure that only those files are processed, as some other files in the directories also have the same string "DEM" included in their filename, which I do not want to modify.

This is my bad attempt due to being a rookie with Py code

 import os

 for dirnames in os.walk('D:\Canterbury_2017Copy'):
     print dirnames
     if dirnames=='04_DEM' > listofdirectoriestoprocess.txt

 print "DONE CHECK TEXT FILE"

Upvotes: 1

Views: 628

Answers (3)

OneRaynyDay
OneRaynyDay

Reputation: 3968

First, you select via regex using re, and then use pathlib:

import re
import pathlib
pattern = re.compile('04DEM')
# You use pattern.search() if s is IN the string
# You use pattern.match() if s COMPLETELY matches the string.
# Apply the correct function to your use case.
files = [s in list_of_files if pattern.search(s)]
all_pruned_paths = set()
for p in files:
    total = ""
    for d in pathlib.Path(p):
        total = os.path.join(total, d)
        if pattern.search(s):
            break
    all_pruned_paths.add(total)
result = list(all_pruned_paths)

This is more robust than using in because you might need to form more complicated queries in the future.

Upvotes: 2

Austin
Austin

Reputation: 26039

Use in to check if a required string is in another string.

This is one quick way:

new_list = []
for path in path_list:
    if '04DEM' in path:
        new_list.append(path)

Demo:

s = 'C:/directory/NewZealand/04DEM/DEM_CD23_1232.tif'
if '04DEM' in s:
    print(True)
# True

Make sure you use / or \\ as directory separator instead of \ because the latter escapes characters.

Upvotes: 2

jpp
jpp

Reputation: 164673

You can use os.path for this:

import os

lst = [r'C:\directory\NewZealand\04DEM\DEM_CD23_1232.tif',
       r'C:\directory\Australia\04DEM\DEM_CD23_1233.tif',
       r'C:\directory\NewZealand\05DSM\DSM_CD23_1232.tif',
       r'C:\directory\Australia\05DSM\DSM_CD23_1232.tif']

def filter_paths(lst, x):
    return [os.path.split(i)[0] for i in lst if os.path.normpath(i).split(os.sep)[3] == x]

res = list(filter_paths(lst, '04DEM'))

# ['C:\\directory\\NewZealand\\04DEM',
#  'C:\\directory\\Australia\\04DEM']

Upvotes: 2

Related Questions