Reputation: 2207
I'm trying to figure a way out to compare each directory path against a given regular expression to find out if it matches that pattern or not.
I have the following list of paths
C:\Dir
C:\Dir\data
C:\Dir\data\file1
C:\Dir\data\file2
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2
I only want to print those paths that match the following pattern where "*" can replace zero or more directory levels and match1 can be either the name of a file or directory.
C:\Dir\*\match1
I figured out that re.match() would help me out with this but I'm having a hard time trying to figure out how to define the pattern and the one I came up with (pasted below) doesn't work at all. item will contain the path in quotes
re.match("((C:\\)(Dir)\\(.*)\\(match1))",item)
Can someone please help me out with this task ?
Upvotes: 2
Views: 4422
Reputation: 11
Since I don't have yet reputation to comment, I'll remark here.
The solution proposed by @Jan works for the particular list of paths in question, but has a few problems if applied as a general solution. If list of paths is as follows:
>>> print paths
C:\Dir
C:\Dir\data
C:\Dir\match1
C:\Dir\data\file1
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2
C:\Dir\data\abcmatch1def\file3
C:\Dir\data\file1\match12
C:\Dir\data\file1\match1
>>>
the (r'C:\Dir\.+?match1.*') fails to match "C:\Dir\match1" and produces false positives, i.e. "C:\Dir\data\abcmatch1def\file3" and "C:\Dir\data\file1\match12".
Proposed solution:
>>> import re
>>> for line in paths.splitlines():
... if re.match(r"C:\\Dir.*\\match1(\\|$)", line):
... print line
...
C:\Dir\match1
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2
C:\Dir\data\file1\match1
>>>
Upvotes: 0
Reputation: 43169
You could go for:
^C:\\Dir\\.+?match1.*
Python
, this would be:
import re
rx = re.compile(r'C:\\Dir\\.+?match1.*')
files = [r'C:\Dir', r'C:\Dir\data', r'C:\Dir\data\file1', r'C:\Dir\data\file2', r'C:\Dir\data\match1\file1', r'C:\Dir\data\match1\file2']
filtered = [match.group(0)
for file in files
for match in [rx.match(file)]
if match]
print(filtered)
Or, if you like filter()
and lambda()
:
filtered = list(filter(lambda x: rx.match(x), files))
Upvotes: 1
Reputation: 5789
Your regexp is:
^C:\\Dir\\.*match1
Explanation is:
C:\\Dir\\
is start sub string of your path
.*
any other symbols in path
match1
explicit name of something that goes after (file or dir)
Upvotes: 0