Reputation: 17435
this might be a rookie question, but how would you efficiently apply regex expressions on strings to retrieve parts of the string ? I have a concrete parsing to solve as an example.
If I have the string :
STRING[(val1)-{val2}-!val3!]
or
STRING[{val2}-!val3!-(val1)]
or
STRING[!val3!-(val1)-{val2}]
etc... (All 8 permutations of !val3!,(val2),{val1} that I won't bother you with).
I can do several easy finds and splits on that to achieve my purpose, but how would you do things in a clean and safe way to get something like :
val1,val2,val3 = ...
?
UPDATE : in the string to parse, val1
, val2
and val3
are variable inputs, and the three delimiters !! () {} are there to help me know where are the three values that each have another semantic meaning.
Upvotes: 0
Views: 589
Reputation: 27575
import re
regx = re.compile('(?<=[\[-])[!({](.+?)[!)}](?=[\]-])')
for ss in ('STRING[(val1)-{val2}-!val3!]',
'STRING[{val2}-!val3!-(val1)]',
'STRING[!val3!-(val1)-{val2}]',
'STRING[!val3!-!val1!-{val2}]'):
print regx.findall(ss)
result
['val1', 'val2', 'val3']
['val2', 'val3', 'val1']
['val3', 'val1', 'val2']
['val3', 'val1', 'val2']
But the exploited string must strictly respect the pattern: without whitespaces, no one of the charcaters ! ( { } ) followed or preceded by [ or - in the desired values (if not so preceded or followed , the assertions in my pattern take the cases correctly), etc
If necessary, the regex pattern could be improved
I improved the pattern in order that if a value is preceded by '!' , it ends with '!',
if preceded by '(', it ends with ')',
and if preceded by '{', it ends with '}'
import re
regx_1 = re.compile('(?<=[\[-])'
'[!({]'
'(.+?)'
'[!)}]'
'(?=[\]-])')
regx_2 = re.compile('(?<=[\[-])'
'(?:'
'(!)'
'|'
'(\()'
'|'
'(\{)'
')'
'(.+?)'
'(?(1)!|(?(2)\)|(?(3)\})))'
'(?=[\]\-])'
)
for ss in ('STRING[(val1)-{val2}-!val3!]',
'STRING[{val2}-!val3!-(val1)]',
'STRING[!val3!-(val1)-{val2}]',
'STRING[!val3!-!val1!-{val2}]',
'STRING[!va}-l3!-!val)-1!-{va!-)-l2}]',):
print ('------------------------------\n'
'%s\n\n%s\n\n%s\n%s\n') % (ss,
regx_1.findall(ss),
regx_2.findall(ss),
[ x[3] for x in regx_2.findall(ss)])
result
------------------------------
STRING[(val1)-{val2}-!val3!]
['val1', 'val2', 'val3']
[('', '(', '', 'val1'), ('', '', '{', 'val2'), ('!', '', '', 'val3')]
['val1', 'val2', 'val3']
------------------------------
STRING[{val2}-!val3!-(val1)]
['val2', 'val3', 'val1']
[('', '', '{', 'val2'), ('!', '', '', 'val3'), ('', '(', '', 'val1')]
['val2', 'val3', 'val1']
------------------------------
STRING[!val3!-(val1)-{val2}]
['val3', 'val1', 'val2']
[('!', '', '', 'val3'), ('', '(', '', 'val1'), ('', '', '{', 'val2')]
['val3', 'val1', 'val2']
------------------------------
STRING[!val3!-!val1!-{val2}]
['val3', 'val1', 'val2']
[('!', '', '', 'val3'), ('!', '', '', 'val1'), ('', '', '{', 'val2')]
['val3', 'val1', 'val2']
------------------------------
STRING[!va}-l3!-!val)-1!-{va!-)-l2}]
['va', 'val', 'va']
[('!', '', '', 'va}-l3'), ('!', '', '', 'val)-1'), ('', '', '{', 'va!-)-l2')]
['va}-l3', 'val)-1', 'va!-)-l2']
Upvotes: 1
Reputation: 63797
To parse your string into a dictionary with "val1", "val2", "val3" as keys and the found strings as corresponding values you could use something as the below code snippet.
import re
examples = '''STRING[!hello!-(world)-{zup}]
STRING[{foobar}-!stack!-(overflow)]
STRING[(this)-{might}-!work!]''';
name_dict = {
'(': 'val1',
'{': 'val2',
'!': 'val3'
}
for testcase in examples.split ('\n'):
result = {}
for x in re.split ('[!)}]-', testcase[7:-2]):
result[name_dict[x[0]]] = x[1:]
print result
output
{'val3': 'hello', 'val2': 'zup', 'val1': 'world'}
{'val3': 'stack', 'val2': 'foobar', 'val1': 'overflow'}
{'val3': 'work', 'val2': 'might', 'val1': 'this'}
Upvotes: 1
Reputation: 336108
You can use lookahead assertions with capturing groups to find matches in any order:
import re
def matchany(subject):
match = re.match(
r"""
(?=.*\(([^()]*)\)) # Find match between () --> group 1
(?=.*\{([^{}]*)\}) # Find match between {} --> group 2
(?=.*!([^!]*)!) # Find match between !! --> group 3""",
subject, re.VERBOSE)
return match.groups() if match else None
Then you can do
>>> matchany("STRING[(val1)-{val2}-!val3!]")
('val1', 'val2', 'val3')
>>> matchany("STRING[!val3!-(val1)-{val2}]")
('val1', 'val2', 'val3')
>>> matchany("STRING[(val1)-{val2}-!val3]")
>>>
Upvotes: 4
Reputation: 99
with a regular expression.
this example does not work if your values contain any of the characters used to form the end-of-value-frame '!', '}', or ')':
import re
s = 'your string here'
val1, val2, val3 = re.sub(r'[!{(](?P<value>.+?)[!})](?P<delimiter>-)?', '\g<value>\g<delimiter>', s).split('-')
Upvotes: 0
Reputation: 13251
Do you really need regex ? Based on your input, the - is looking as a seperator so:
>>> s = "STRING[(val1)-{val2}-!val3!]"
# First step, remove the "STRING[" + "]" from the source string
>>> s = s[7:-1]
>>> print s
(val1)-{val2}-!val3!
# Then, split on the '-', and remove the first/last char (mean []!! etc..)
# Work with any other values if they don't have '-' in it
>>> [x[1:-1] for x in s.split('-')]
['val1', 'val2', 'val3']
Upvotes: 1
Reputation: 43031
To get a truly complete answer, you should clarify further what are the possible input strings, and what is the format of the string to match. However this is an example illustrating at least the basic matching mechanism with re.findall
>>> examples = '''STRING[(val1)-{val2}-!val3!]
... STRING[{val2}-!val3!-(val1)]
... STRING[!val3!-(val1)-{val2}]
... '''
>>> for line in examples.split('\n'):
... print re.findall(r'val\d', 'STRING[(val1)-{val2}-!val3!]')
...
['val1', 'val2', 'val3']
['val1', 'val2', 'val3']
['val1', 'val2', 'val3']
['val1', 'val2', 'val3']
EDIT: the regex r'val\d'
instructs to match all those bits that are formed by the letters val
+ one, and only one digit.
HTH!
Upvotes: 1