Reputation: 693
I am trying to extract path from a given file which meet some criteria: Example: I have a small file with contents something like :
contentsaasdf /net/super/file-1.txt othercontents...
data is in /sample/random/folder/folder2/file-2.txt otherdata...
filename /otherfile/other-3.txt somewording
I want to extract the path's from file which contain file-*.txt in it.
In above example, I need the below path's as output
/net/super/file-1.txt
/sample/random/folder/folder2/file-2.txt
Any suggestions with Python code ? I am trying regex. But facing issues with multiple folder's, etc. Something like:
FileRegEx = re.compile('.*(file-\\d.txt).*', re.IGNORECASE|re.DOTALL)
Upvotes: 0
Views: 298
Reputation: 107287
You don't need .*
just use character classes properly:
r'[\/\w]+file-[^.]+\.txt'
[\/\w]+
will match any combinations of word characters and /
. And [^.]+
will match any combination of characters except dot.
Demo:
https://regex101.com/r/ytsZ0D/1
Note that this regex might be kind of general, In that case, if you want to exclude some cases you can use ^
within character class or another proper pattern, based on your need.
Upvotes: 1
Reputation: 19806
Try this:
import re
re.findall('/.+\.txt', s)
# Output: ['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']
Output:
>>> import re
>>>
>>> s = """contentsaasdf /net/super/file-1.txt othercontents...
... data is in /sample/random/folder/folder2/file-2.txt otherdata...
... filename /otherfile/other-3.txt somewording"""
>>>
>>> re.findall('/.+\.txt', s)
['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']
Upvotes: 0
Reputation: 8695
Assuming your filenames are white-space separated ...
\\s(\\S+/file-\\d+\\.txt)\\s
\\s
- match a white-space character\\S+
- matches one or more non-whitespace characters\\d+
- matches one or more digits\\.
- turns the .
into a non-interesting period, instead of a match any characterYou can avoid the double backslashes using r'' strings:
r'\s(\S+/file-\d+\.txt)\s'
Upvotes: 0