Convert ugly string (custom non-json format) to dictionary

Question

I have this string I want to load as a dictionary so that I can access each and every item.

my_str = "[
    id=xyz-111,
    abc= {
            item=[
                    {
                        a=xyz,
                        b=123,
                        c={},
                        d={
                        i=[{ip=0.0.0.0/0}]
                            },
                    }
            ]
    }
]"

Currently I am using regex (re library) to get the value of any item in the string, which works.

Is there any cleaner way to convert this string to a dictionary? I have tried json.loads() and ast which do not work.

Expected result:

my_dict = {
    'id':'xyz-111',
    'abc': {
            'item':[
                    {
                        'a':'xyz',
                        'b':123,
                        'c':{},
                        'd':{
                        'i':[{'ip':'0.0.0.0/0'}]
                            },
                    }
            ]
    }
}

benvc · Accepted Answer

Well, this is pretty ugly but it may give you a starting place from which to create a more efficient solution. Basically, a series of substitutions with the first one including a slice and replacement of the opening and closing brackets with dict closures. Then, ast.literal_eval to convert to dict.

import ast
import re

s = """
[
    id=xyz-111,
    abc= {
      item=[
        {
          a=xyz,
          b=123,
          c={},
          d={
            i=[{ip=0.0.0.0/0}]
          },
        }
      ]
    }
]
"""

a = '{' + re.sub(r'=', r':', re.sub(r'\s+', '', s))[1:-1] + '}'
b = re.sub(r'([{}[\]:,])([^{}[\]:,])', r'\1"\2', a)
c = re.sub(r'([^{}[\]:,])([{}[\]:,])', r'\1"\2', b)
d = ast.literal_eval(c)

print(d)
# {'id': 'xyz-111', 'abc': {'item': [{'a': 'xyz', 'b': '123', 'c': {}, 'd': {'i': [{'ip': '0.0.0.0/0'}]}}]}}

a removes all whitespace, replaces = with :, and replaces outer [] with {} (the whitespace removal is a blunt instrument and would need to be more specifically targeted if the data contained strings with spaces that needed to be preserved)
b inserts " after brackets, semicolons or commas not followed by any of those characters
c inserts " before brackets, semicolons or commas not preceeded by any of those characters
d converts string to dict using ast.literal_eval which is slightly more forgiving than json.loads

Convert ugly string (custom non-json format) to dictionary

Answers (2)

proper solution

hackish solution

Related Questions