Reputation: 3852
Hello I'm currently trying to parse a script that contains paths to files similar to the ones given below. I would like to parse the file using regular expressions and store the data into a string with '\n' separation between files. Example file given below.
SAMPLE FILE: ('#' is a comment would like to keep commented out)
add file -tls "../path1/path2/path3/example_1.edf"
add file -tls "../path1/path2/path3/example_1.v"
add file -tls "../path1/path2/path3/exa_4mple_1.sv"
add file -tls "../path1/path2/path3/example_1.vh"
#add file -tls "../path1/path2/path3/exa_0mple_1.vhd"
SAMPLE OUTPUT: (this example excludes the '\n' character)
example_1.v
exa_4mple_1.sv
example_1.vh
#exa_0mple_1.vhd
How can I construct the extension 're' so that it only includes the above extensions and excludes others? I'm also wondering if it's possible to catch the '#' for commented out lines and prepend it the file name with a '#'.
-Desired function:
for match in re.finditer(r'/([A-Za-z0-9_]+\..+)"', contents):
mylist.append(match.group(1))
-Working Code: ( tested on the '.v' file case )
re.finditer(r'/([A-Za-z0-9_]+\.v)"', contents)
Upvotes: 0
Views: 696
Reputation: 27575
Is this what you want ? :
import re
contents = '''
add file -tls "../path1/path2/path3/example_1.edf"
add file -tls "../path1/path2/path3/example_1.v"
add file -tls "../path1/path2/path3/exa_4mple_1.sv"
add file -tls "../path1/path2/path3/example_1.vh"
#add file -tls "../path1/path2/path3/exa_0mple_1.vhd"
'''
print contents
pat = "^(#?)add file.+?\"\.\./(?:\w+/)*(\w+?\.\w*v\w*)\"\s*$"
gen = (''.join(mat.groups())
for mat in re.finditer(pat,contents,re.MULTILINE))
print '\n'.join(gen)
The pattern allows to catch paths with extensions containing the letter 'v', that's waht I understood from your question.
I also put the string add file
as a criterium of catching, according to your example.
I used \w
in the pattern.
With this pattern, all paths are supposed to begin with ../
If all these characteristcs aren't adapted to your case, we'll change what needs to be changed.
Note that I put \s*
at the end of the pattern, in case there are whitespaces in the line after the path.
Upvotes: 1
Reputation: 59974
Regular expressions are not needed:
>>> import os
>>> L = [
... "/path1/path2/path3/example_1.edf",
... "/path1/path2/path3/example_1.v",
... "/path1/path2/path3/exa_4mple_1.sv",
... "/path1/path2/path3/example_1.vh" ]
>>> for mypath in L:
... if mypath.split('.')[-1] in ('v', 'sv', 'vh'):
... print os.path.split(mypath)[1]
...
example_1.v
exa_4mple_1.sv
example_1.vh
Or as a list comprehension:
>>> [os.path.split(mypath)[1]
... for mypath in L
... if mypath.split('.')[-1] in ('v', 'sv', 'vh')]
['example_1.v', 'exa_4mple_1.sv', 'example_1.vh']
Upvotes: 1