Siva
Siva

Reputation: 693

Extract path from lines in a file using python

I am trying to extract path from a given file which meet some criteria: Example: I have a small file with contents something like :

contentsaasdf /net/super/file-1.txt othercontents...
data is in /sample/random/folder/folder2/file-2.txt  otherdata...
filename  /otherfile/other-3.txt somewording

I want to extract the path's from file which contain file-*.txt in it.

In above example, I need the below path's as output

/net/super/file-1.txt
/sample/random/folder/folder2/file-2.txt

Any suggestions with Python code ? I am trying regex. But facing issues with multiple folder's, etc. Something like:

 FileRegEx = re.compile('.*(file-\\d.txt).*', re.IGNORECASE|re.DOTALL)

Upvotes: 0

Views: 298

Answers (3)

Kasravnd
Kasravnd

Reputation: 107287

You don't need .* just use character classes properly:

r'[\/\w]+file-[^.]+\.txt'

[\/\w]+ will match any combinations of word characters and /. And [^.]+ will match any combination of characters except dot.

Demo:

https://regex101.com/r/ytsZ0D/1

Note that this regex might be kind of general, In that case, if you want to exclude some cases you can use ^ within character class or another proper pattern, based on your need.

Upvotes: 1

ettanany
ettanany

Reputation: 19806

Try this:

import re

re.findall('/.+\.txt', s)
# Output: ['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']

Output:

>>> import re
>>> 
>>> s = """contentsaasdf /net/super/file-1.txt othercontents...
... data is in /sample/random/folder/folder2/file-2.txt  otherdata...
... filename  /otherfile/other-3.txt somewording"""
>>> 
>>> re.findall('/.+\.txt', s)
['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']

Upvotes: 0

AJNeufeld
AJNeufeld

Reputation: 8695

Assuming your filenames are white-space separated ...

\\s(\\S+/file-\\d+\\.txt)\\s
  • \\s - match a white-space character
  • \\S+ - matches one or more non-whitespace characters
  • \\d+ - matches one or more digits
  • \\. - turns the . into a non-interesting period, instead of a match any character

You can avoid the double backslashes using r'' strings:

r'\s(\S+/file-\d+\.txt)\s'

Upvotes: 0

Related Questions