Reputation: 2207
strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?P<file_path>.*)(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
I am trying to get this regular expression with a lookahead to work the way I though it would. Examples of Look Aheads on most websites seem to be pretty basic string matches i.e. not matching 'bar' if it is preceded by a 'foo' as an example of a negative look behind.
My goal is to capture in the group file_path
the actual file path only if the string does NOT have an 8 character length number in it just before the pipe symbol |
and match anything after the pipe symbol in another group (something I haven't implemented here).
So in the above example it should match only the first two strings
C:\Photos\Selfies\1
C:\HDPhotos\Landscapes\2
In case of the last string
C:\Filters\Pics2\00000000|XAV7
I'd like to match C:\Filters\Pics2\00000000
in <file_path>
and match XAV7
in another group named .
(This is something I can figure out on my own if I get some help with the negative look ahead)
Currently <file_path> matches everything, which makes sense since it is non-greedy (.*) I want it to only capture if the last part of the string before the pipe symbol is NOT an 8 length character.
OUTPUT OF CODE SNIPPET PASTED BELOW
FILE PATH = C:\Photos\Selfies\1|
FILE PATH = C:\HDPhotos\Landscapes\2|
FILE PATH = C:\Filters\Pics\12345678|
FILE PATH = C:\Filters\Pics2\00000000|
FILE PATH = C:\Filters\Pics2\00000000|XAV7
Making this modification of \\
matchptrn = re.match(r"(?P<file_path>.*)\\(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
makes things worse as the output is
FILE PATH = C:\Photos\Selfies
FILE PATH = C:\HDPhotos\Landscapes
FILE PATH = C:\Filters
FILE PATH = C:\Filters
FILE PATH = C:\Filters
Can someone please explain this as well ?
Upvotes: 2
Views: 162
Reputation: 626845
You can use
^(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)
See the regex demo.
Details
^
- start of a string(?!.*\\\d{8}\|$)
- fail the match if the string contains \
followed with eight digits and then |
at the end of string(?P<file_path>.*)
- Group "file_path": any zero or more chars other than line break chars as many as possible\|
- a pipe(?P<suffix>.*)
- Group "sfuffix": the rest of the string, any zero or more chars other than line break chars, as many as possible.See the Python demo:
import re
strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)", string)
if matchptrn:
print("FILE PATH = {}, SUFFIX = {}".format(*matchptrn.groups()))
Output:
FILE PATH = C:\Photos\Selfies\1, SUFFIX =
FILE PATH = C:\HDPhotos\Landscapes\2, SUFFIX =
FILE PATH = C:\Filters\Pics2\00000000, SUFFIX = XAV7
Upvotes: 1