IMCoins
IMCoins

Reputation: 3306

Matching outer braces group using regex

I'm trying to find a way to parse the following string, into a list of strings using regex.

{"first_statement" : 1, "bleh" : { "some_data" : True } }, {"second_statement" : 2}

# Group 1:
{"first_statement" : 1, "bleh" : { "some_data" : True } }

# Group 2:
{"second_statement" : 2}

I want my regex to match the most outer braces pattern, no matter how many internal braces there are. For instance...

{"first_statement" : 1, "bleh" : { "some_data" : True, "foo" : { "bar" : { "zing" : False } } } }

# Group 1:
{"first_statement" : 1, "bleh" : { "some_data" : True, "foo" : { "bar" : { "zing" : False } } } }

I haven't got much experience with regex, but I tried some things, and the closer I got is a simple pattern... {.*?}, but it obviously closed my match when it first encountered a closing braces. Until then, all my other attempts failed, the closer I got was a .NET regex solution but I couldn't get it to work on python.

Is there even a way to do it using python regex, or do I have to parse my string character by character using a simple loop ? As far as I have researched exploring the All tokens of regex101, there is no simple way to achieving this.

Note : I don't care about the characters in between the first layer of braces. I want to ignore them.

Upvotes: 0

Views: 131

Answers (2)

Graipher
Graipher

Reputation: 7186

For the special case that your string is an almost legal JSON string only missing the surrounding braces (which seems to be almost the case here), you can just add the braces and try to parse it as a JSON string:

import json 
s = '{"first_statement" : 1, "bleh" : { "some_data" : "True" } }, {"second_statement" : 2}'
try:
    x = json.loads('[' + s + ']')
except json.JSONDecodeError:
    # do something?
    x = None
print(x)
# [{'bleh': {'some_data': 'True'}, 'first_statement': 1},
#  {'second_statement': 2}]

This is similar to adding the braces and parsing it using ast.literal_eval, as suggested by @jpp in his answer, but will be a bit stricter on what it accepts (because the string needs to be a legal JSON string, except for the missing list braces). Note for example that I needed to add quotes around the True, to make it so.

Upvotes: 0

jpp
jpp

Reputation: 164713

One way without regex is to use ast.literal_eval:

from ast import literal_eval

mystr = '{"first_statement" : 1, "bleh" : { "some_data" : True } },
         {"second_statement" : 2}'

lst = list(map(str, literal_eval('['+mystr+']')))

# ["{'first_statement': 1, 'bleh': {'some_data': True}}",
#  "{'second_statement': 2}"]

Upvotes: 1

Related Questions