Reputation: 455
I am trying to search for a specific pattern at the beginning of a line from below text file:
`I_DIG(DN, PSUP, NSUP)
`I_DIG(FAST_START, PSUP, NSUP)
`IO_DIG(TEST, PSUP, NSUP)
`I_ANA(IBIAS_200N)
Random text
`SUP_ANA(NSUP)
`I_ANA(VREF)
`I_VEC_DIG(1, 0, DEGEN_TRIM, PSUP, NSUP)
`I_VEC_DIG(1, 0, GAIN_SEL, PSUP, NSUP)
`O_ANA(IOUTN)
`O_ANA(IOUTP)
`O_VEC_ANA(1, 0, IBIAS_OUT)
`O_VEC_ANA(1, 0, ICAL)
`O_DIG(OUT,PSUP,NSUP)
`IO_ANA(TEST2)
Garbage text
`IO_DIG(TEST3,PSUP_HV,NSUP_HV)
I would like to search for any line starting with I_
or IO_
or O_
or SUP_
. and then once I found a match, I would like to capture every string in that line in an individual group.
Here is the regex I'm using:
r'^(`I_\w+|`IO_\w+|`SUP_(\w+)|`O_\w+)(\s*\()(\s*\d*,*)(\s*\d*,*)(\s*(\w+),)(\s*(\w+),)(\s*(\w+)\))',flags=re.M
This captures all line I needed except I_ANA
, SUP_ANA
, IO_ANA
, O_ANA
and I_VEC_ANA
. Maybe I need to write a separate regex when string contains 'ANA'?
What is the best regex you would recommend to capture these lines and put every string in that line in a group?
Thanks.
Upvotes: 0
Views: 52
Reputation: 77399
No need to solve everything in a single regular expression.
def get_data(text):
for line in text:
if re.match(r"^`?(I|IO|O|SUP)_", line):
m = re.search(r'(.+?)\((.+?)\)', line)
if m:
yield {
"fn": m.group(1),
"args": re.split(',\s*', m.group(2))
}
Testing:
>>> for line in get_data(text):
print(line)
{'fn': 'I_DIG', 'args': ['DN', 'PSUP', 'NSUP']}
{'fn': 'I_DIG', 'args': ['FAST_START', 'PSUP', 'NSUP']}
{'fn': 'IO_DIG', 'args': ['TEST', 'PSUP', 'NSUP']}
{'fn': 'I_ANA', 'args': ['IBIAS_200N']}
{'fn': 'SUP_ANA', 'args': ['NSUP']}
{'fn': 'I_ANA', 'args': ['VREF']}
{'fn': 'I_VEC_DIG', 'args': ['1', '0', 'DEGEN_TRIM', 'PSUP', 'NSUP']}
{'fn': 'I_VEC_DIG', 'args': ['1', '0', 'GAIN_SEL', 'PSUP', 'NSUP']}
{'fn': 'O_ANA', 'args': ['IOUTN']}
{'fn': 'O_ANA', 'args': ['IOUTP']}
{'fn': 'O_VEC_ANA', 'args': ['1', '0', 'IBIAS_OUT']}
{'fn': 'O_VEC_ANA', 'args': ['1', '0', 'ICAL']}
{'fn': 'O_DIG', 'args': ['OUT', 'PSUP', 'NSUP']}
{'fn': 'IO_ANA', 'args': ['TEST2']}
{'fn': 'IO_DIG', 'args': ['TEST3', 'PSUP_HV', 'NSUP_HV']}
Upvotes: 1
Reputation: 3382
If all you are doing is trying to match what a line startswith, why not use str.startswith
? You can pass in a tuple of multiple matches. This is all without regular expressions too.
This is reading from the file you linked to below:
>>> with open("test.vams", "r") as f:
... for line in f:
... if line.startswith(('`I_', '`IO_', '`O_', '`SUP_')):
... fn, args = line.strip('`)\n').split('(')
... args = [arg.strip() for arg in args.split(',')]
... print({'fn': fn, 'args': args})
...
{'fn': 'SUP_ANA', 'args': ['NSUP']}
{'fn': 'SUP_ANA', 'args': ['PSUP']}
{'fn': 'I_DIG', 'args': ['SEL', 'PSUP', 'NSUP']}
{'fn': 'I_ANA', 'args': ['A']}
{'fn': 'O_ANA', 'args': ['B']}
Upvotes: 1
Reputation: 6543
Here's a non-regex solution, variable data
contains your input string you've read from file:
prefixes = {'I', 'IO', 'O', 'SUP'}
lines = [line for line in data.split('\n') if '_' in line and
line.strip('`').split('_')[0] in prefixes]
Upvotes: 1