Rick Zhang
Rick Zhang

Reputation: 89

Extract file names from a given directory with regex

I am pretty weak in regex. I'm looking for some help with how to extract the .sav file name from the following string:

C:\Users...\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\AutumnHi-20180531-183047-34-SystemNormal\AutumnHi-20180531-183047-34-SystemNormal.sav

Currently I am using this code:

re.findall(r'\\(.+).sav',txt)

but it only finds

['Users\\...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\AutumnHi-20180531-183047-34-SystemNormal.sav was']

I'm trying to find "AutumnHi-20180531-183047-34-SystemNormal.sav"

I am using Python 3.7.

Upvotes: 1

Views: 183

Answers (5)

Emma
Emma

Reputation: 27723

I'm guessing that these expressions:

[^\\]+\.sav
([^\\]+\.sav)

or some similar derivative of those might likely extract what we might want here.

Test

import re

print(re.findall(r"([^\\]+\.sav)", "C:\\Users...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\\AutumnHi-20180531-183047-34-SystemNormal.sav"))

Output

['AutumnHi-20180531-183047-34-SystemNormal.sav']

Demo

Upvotes: 0

Barry Scott
Barry Scott

Reputation: 819

I am assuming you are not learning about regex but want to know how to handle parsing filenames.

I would use the pathlib module to handle parsing the filename.

C:\Users\barry>py -3.7
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> filename = r'C:\Users\...\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\WinterLo-20180729-043047-34-SystemNormal\WinterLo-20180729-043047-34-SystemNormal.sav'
>>> path = pathlib.Path(filename)
>>> path.name
'WinterLo-20180729-043047-34-SystemNormal.sav'
>>> path.parent
WindowsPath('C:/Users/.../Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20/WinterLo-20180729-043047-34-SystemNormal')
>>>

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163352

You could match a backslash and then capture in a group matching not a backslash using a negated character class. Then match a dot and sav.

You might use a negative lookahead to assert what is directly on the right is not a non whitespace char.

\\([^\\]+\.sav)(?!\S)

Regex demo

Upvotes: 1

SanV
SanV

Reputation: 945

The following pattern should match the filename:
(?=[^\\]*$).*\.sav

Regex101 Demo

The above pattern asserts (?= is positive lookahead) that no other character up to the end of the string is a backslash. So essentially it finds the last backslash and then matches the desired text. For other details, see "EXPLANATION" on the right side of the regex101 demo at the link above.

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195438

Regex101 (link):

txt = r'''C:\Users\\...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\WinterLo-20180729-043047-34-SystemNormal\\WinterLo-20180729-043047-34-SystemNormal.sav'''

import re

print(re.findall(r'(?<=\\)[^\\]+sav',txt)[0])

Prints:

WinterLo-20180729-043047-34-SystemNormal.sav

You could achieve the same without re module:

print(txt.split('\\')[-1])

Upvotes: 0

Related Questions