Reputation: 455

Search for a line in a text file with particular pattern

I am trying to search for a specific pattern at the beginning of a line from below text file:

`I_DIG(DN, PSUP, NSUP)
`I_DIG(FAST_START, PSUP, NSUP)
`IO_DIG(TEST, PSUP, NSUP)
`I_ANA(IBIAS_200N)
Random text
`SUP_ANA(NSUP)
`I_ANA(VREF)
`I_VEC_DIG(1, 0, DEGEN_TRIM, PSUP, NSUP)
`I_VEC_DIG(1, 0, GAIN_SEL, PSUP, NSUP)
`O_ANA(IOUTN)
`O_ANA(IOUTP)
`O_VEC_ANA(1, 0, IBIAS_OUT)
`O_VEC_ANA(1, 0, ICAL)
`O_DIG(OUT,PSUP,NSUP)
`IO_ANA(TEST2)
Garbage text
`IO_DIG(TEST3,PSUP_HV,NSUP_HV)

I would like to search for any line starting with I_ or IO_ or O_ or SUP_. and then once I found a match, I would like to capture every string in that line in an individual group. Here is the regex I'm using:

r'^(`I_\w+|`IO_\w+|`SUP_(\w+)|`O_\w+)(\s*\()(\s*\d*,*)(\s*\d*,*)(\s*(\w+),)(\s*(\w+),)(\s*(\w+)\))',flags=re.M

This captures all line I needed except I_ANA, SUP_ANA, IO_ANA, O_ANA and I_VEC_ANA. Maybe I need to write a separate regex when string contains 'ANA'?

What is the best regex you would recommend to capture these lines and put every string in that line in a group?

Thanks.

Upvotes: 0

Answers (3)

Paulo Scardine

Reputation: 77399

No need to solve everything in a single regular expression.

def get_data(text):
    for line in text:
        if re.match(r"^`?(I|IO|O|SUP)_", line):
            m = re.search(r'(.+?)\((.+?)\)', line)
            if m:
                yield {
                    "fn": m.group(1),
                    "args": re.split(',\s*', m.group(2))
                }

Testing:

>>> for line in get_data(text):
        print(line)

{'fn': 'I_DIG', 'args': ['DN', 'PSUP', 'NSUP']}
{'fn': 'I_DIG', 'args': ['FAST_START', 'PSUP', 'NSUP']}
{'fn': 'IO_DIG', 'args': ['TEST', 'PSUP', 'NSUP']}
{'fn': 'I_ANA', 'args': ['IBIAS_200N']}
{'fn': 'SUP_ANA', 'args': ['NSUP']}
{'fn': 'I_ANA', 'args': ['VREF']}
{'fn': 'I_VEC_DIG', 'args': ['1', '0', 'DEGEN_TRIM', 'PSUP', 'NSUP']}
{'fn': 'I_VEC_DIG', 'args': ['1', '0', 'GAIN_SEL', 'PSUP', 'NSUP']}
{'fn': 'O_ANA', 'args': ['IOUTN']}
{'fn': 'O_ANA', 'args': ['IOUTP']}
{'fn': 'O_VEC_ANA', 'args': ['1', '0', 'IBIAS_OUT']}
{'fn': 'O_VEC_ANA', 'args': ['1', '0', 'ICAL']}
{'fn': 'O_DIG', 'args': ['OUT', 'PSUP', 'NSUP']}
{'fn': 'IO_ANA', 'args': ['TEST2']}
{'fn': 'IO_DIG', 'args': ['TEST3', 'PSUP_HV', 'NSUP_HV']}

Upvotes: 1

G_M

Reputation: 3382

If all you are doing is trying to match what a line startswith, why not use str.startswith? You can pass in a tuple of multiple matches. This is all without regular expressions too.

This is reading from the file you linked to below:

>>> with open("test.vams", "r") as f:
...     for line in f:
...         if line.startswith(('`I_', '`IO_', '`O_', '`SUP_')):
...             fn, args = line.strip('`)\n').split('(')
...             args = [arg.strip() for arg in args.split(',')]
...             print({'fn': fn, 'args': args})
... 
{'fn': 'SUP_ANA', 'args': ['NSUP']}
{'fn': 'SUP_ANA', 'args': ['PSUP']}
{'fn': 'I_DIG', 'args': ['SEL', 'PSUP', 'NSUP']}
{'fn': 'I_ANA', 'args': ['A']}
{'fn': 'O_ANA', 'args': ['B']}

Upvotes: 1

sjw

Reputation: 6543

Here's a non-regex solution, variable data contains your input string you've read from file:

prefixes = {'I', 'IO', 'O', 'SUP'}
lines = [line for line in data.split('\n') if '_' in line and 
         line.strip('`').split('_')[0] in prefixes]

Upvotes: 1

Search for a line in a text file with particular pattern

Answers (3)

Related Questions