Reputation: 12234
I am trying to build a regex to convert a string which converts this
'{a:bilby.core.prior.Uniform(-1,1,a, func=g(1, 2)),b:2}'
into
{"a": "Uniform(-1,1,a, func=g(1, 2))", "b": "2"}
Note that the values of the dictionary are still strings (they will be subsequently interpreted by another function)
So far, I've got this
>>> import re
>>> re.sub(r'([A-Za-z/\.0-9\-\+][^\[\],:"}]*)', r'"\g<1>"', '{a:bilby.core.prior.Uniform(-1,1,a, func=g(1, 2)),b:2}')
'{"a":"bilby.core.prior.Uniform(-1","1","a", "func=g(1", "2))","b":"2"}'
But the trouble is it matched on the comma's inside the parenthesis. Is there a way to match only if not inside parentheses?
I'm aware regex doesn't support nested parentheses, is this an issue here?
Upvotes: 3
Views: 68
Reputation: 36623
As @Giacomo pointed out, nested parenthesis require complex regular expressions. You will be better off rolling your own parser that can handle each case. You can still use regular expressions, but it does not have to be one-regex-to-rule-them-all.
import re
def comma_partition(s):
"""Partitions `s` at top-level commas"""
s = s.strip('{').strip('}')
in_parens = 0
ixs = []
for i, c in enumerate(s):
if c == '(':
in_parens += 1
if c == ')':
in_parens -= 1
if not in_parens and c == ',':
ixs.append(i)
return [s[sc] for sc in make_partition_slices(ixs)]
def make_partition_slices(ixs):
"""Yields partitioning slices, skipping each index of `ixs`"""
ix_x = [None] + ixs
ix_y = ixs + [None]
for x, y in zip(ix_x, ix_y):
yield slice(x + 1 if x else x, y)
def kv_parser(kv_str):
"""Takes a string in 'K:V' format and returns dictionary.
Leading namespace in `V` is removed.
"""
k, v = kv_str.split(':', 1)
v = re.sub(f'^([A-Za-z_]([A-Za-z0-9_])*\.)+', '', v)
return {k: v}
Above we define 3 functions, one to find the indices of the top level commas (those not in parenthesis). This is done by simply counting the open and closed parenthesis. The second function generates slices that will partition the string to the left and right of each index for the commas. The third is the actual parser that splits the key and value, and cleans up the values namespace using a simple regular expression.
Below we run it on your test case.
s = '{a:bilby.core.prior.Uniform(-1,1,a, func=g(1, 2)),b:2}'
out = {}
for p in comma_partition(s):
out.update(kv_parser(p))
out
# returns:
{'a': 'Uniform(-1,1,a, func=g(1, 2))', 'b': '2'}
It is more code, but it is much MUCH easier to modify and maintain than a complex regular expression.
Upvotes: 3