Shakeel
Shakeel

Reputation: 2035

Convert ugly string (custom non-json format) to dictionary

I have this string I want to load as a dictionary so that I can access each and every item.

my_str = "[
    id=xyz-111,
    abc= {
            item=[
                    {
                        a=xyz,
                        b=123,
                        c={},
                        d={
                        i=[{ip=0.0.0.0/0}]
                            },
                    }
            ]
    }
]"

Currently I am using regex (re library) to get the value of any item in the string, which works.

Is there any cleaner way to convert this string to a dictionary? I have tried json.loads() and ast which do not work.

Expected result:

my_dict = {
    'id':'xyz-111',
    'abc': {
            'item':[
                    {
                        'a':'xyz',
                        'b':123,
                        'c':{},
                        'd':{
                        'i':[{'ip':'0.0.0.0/0'}]
                            },
                    }
            ]
    }
}

Upvotes: 2

Views: 321

Answers (2)

benvc
benvc

Reputation: 15130

Well, this is pretty ugly but it may give you a starting place from which to create a more efficient solution. Basically, a series of substitutions with the first one including a slice and replacement of the opening and closing brackets with dict closures. Then, ast.literal_eval to convert to dict.

import ast
import re

s = """
[
    id=xyz-111,
    abc= {
      item=[
        {
          a=xyz,
          b=123,
          c={},
          d={
            i=[{ip=0.0.0.0/0}]
          },
        }
      ]
    }
]
"""

a = '{' + re.sub(r'=', r':', re.sub(r'\s+', '', s))[1:-1] + '}'
b = re.sub(r'([{}[\]:,])([^{}[\]:,])', r'\1"\2', a)
c = re.sub(r'([^{}[\]:,])([{}[\]:,])', r'\1"\2', b)
d = ast.literal_eval(c)

print(d)
# {'id': 'xyz-111', 'abc': {'item': [{'a': 'xyz', 'b': '123', 'c': {}, 'd': {'i': [{'ip': '0.0.0.0/0'}]}}]}}
  • a removes all whitespace, replaces = with :, and replaces outer [] with {} (the whitespace removal is a blunt instrument and would need to be more specifically targeted if the data contained strings with spaces that needed to be preserved)
  • b inserts " after brackets, semicolons or commas not followed by any of those characters
  • c inserts " before brackets, semicolons or commas not preceeded by any of those characters
  • d converts string to dict using ast.literal_eval which is slightly more forgiving than json.loads

Upvotes: 3

J_H
J_H

Reputation: 20560

I agree with you that ordinarily json.loads() would be the first choice for ingesting that. Where did that string come from?

proper solution

It appears that layer_1 of some piece of code produced well-formed JSON, and then layer_2 stripped out quotes. Find layer_2, and tell it to stop doing that. Or, replicate layer_2, have your own code consume the original inputs and do a better job of processing it, so the quotes don't get lost.

hackish solution

There certainly is some structure remaining there, between punctuation and line endings, so in the Worst Case it would be worth your while to hack together an UnStrip routine that puts back the missing quotes. In the case of e.g. b=123, it would not be so bad to emit 'b':'123', as you can always post process, where you recursively try to convert dictionary values to numbers, with a try / except to ignore the error if the value turns out to look more like 'xyz' than some integer.

Actually, the example of wrapping n = float(s) within a try is instructive. There may be some ambiguity in any given line of the input, with the option to try variant A or B as valid JSON. It may be useful to attempt both, wrapped in a try, and return the first one that wins, first one that evaluates as valid JSON.

Upvotes: 0

Related Questions