Dhiwakar Ravikumar
Dhiwakar Ravikumar

Reputation: 2207

How to check if a given pathname matches a given regular expression in Python

I'm trying to figure a way out to compare each directory path against a given regular expression to find out if it matches that pattern or not.

I have the following list of paths

C:\Dir
C:\Dir\data
C:\Dir\data\file1
C:\Dir\data\file2
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2

I only want to print those paths that match the following pattern where "*" can replace zero or more directory levels and match1 can be either the name of a file or directory.

C:\Dir\*\match1

I figured out that re.match() would help me out with this but I'm having a hard time trying to figure out how to define the pattern and the one I came up with (pasted below) doesn't work at all. item will contain the path in quotes

re.match("((C:\\)(Dir)\\(.*)\\(match1))",item)

Can someone please help me out with this task ?

Upvotes: 2

Views: 4422

Answers (3)

DigiGen
DigiGen

Reputation: 11

Since I don't have yet reputation to comment, I'll remark here.

The solution proposed by @Jan works for the particular list of paths in question, but has a few problems if applied as a general solution. If list of paths is as follows:

>>> print paths
C:\Dir
C:\Dir\data
C:\Dir\match1
C:\Dir\data\file1
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2
C:\Dir\data\abcmatch1def\file3
C:\Dir\data\file1\match12
C:\Dir\data\file1\match1
>>>

the (r'C:\Dir\.+?match1.*') fails to match "C:\Dir\match1" and produces false positives, i.e. "C:\Dir\data\abcmatch1def\file3" and "C:\Dir\data\file1\match12".

Proposed solution:

>>> import re
>>> for line in paths.splitlines():
...     if re.match(r"C:\\Dir.*\\match1(\\|$)", line):
...             print line
...
C:\Dir\match1
C:\Dir\data\match1\file1
C:\Dir\data\match1\file2
C:\Dir\data\file1\match1
>>>

Upvotes: 0

Jan
Jan

Reputation: 43169

You could go for:

^C:\\Dir\\.+?match1.*

See a demo on regex101.com.


In Python, this would be:

import re

rx = re.compile(r'C:\\Dir\\.+?match1.*')

files = [r'C:\Dir', r'C:\Dir\data', r'C:\Dir\data\file1', r'C:\Dir\data\file2', r'C:\Dir\data\match1\file1', r'C:\Dir\data\match1\file2']

filtered = [match.group(0) 
            for file in files 
            for match in [rx.match(file)] 
            if match]

print(filtered)

Or, if you like filter() and lambda():

filtered = list(filter(lambda x: rx.match(x), files))

Upvotes: 1

valex
valex

Reputation: 5789

Your regexp is:

^C:\\Dir\\.*match1

Explanation is:

C:\\Dir\\ is start sub string of your path

.* any other symbols in path

match1 explicit name of something that goes after (file or dir)

Upvotes: 0

Related Questions