Reputation: 2035
I have this string I want to load as a dictionary so that I can access each and every item.
my_str = "[
id=xyz-111,
abc= {
item=[
{
a=xyz,
b=123,
c={},
d={
i=[{ip=0.0.0.0/0}]
},
}
]
}
]"
Currently I am using regex (re library) to get the value of any item in the string, which works.
Is there any cleaner way to convert this string to a dictionary? I have tried json.loads()
and ast
which do not work.
Expected result:
my_dict = {
'id':'xyz-111',
'abc': {
'item':[
{
'a':'xyz',
'b':123,
'c':{},
'd':{
'i':[{'ip':'0.0.0.0/0'}]
},
}
]
}
}
Upvotes: 2
Views: 321
Reputation: 15130
Well, this is pretty ugly but it may give you a starting place from which to create a more efficient solution. Basically, a series of substitutions with the first one including a slice and replacement of the opening and closing brackets with dict closures. Then, ast.literal_eval
to convert to dict.
import ast
import re
s = """
[
id=xyz-111,
abc= {
item=[
{
a=xyz,
b=123,
c={},
d={
i=[{ip=0.0.0.0/0}]
},
}
]
}
]
"""
a = '{' + re.sub(r'=', r':', re.sub(r'\s+', '', s))[1:-1] + '}'
b = re.sub(r'([{}[\]:,])([^{}[\]:,])', r'\1"\2', a)
c = re.sub(r'([^{}[\]:,])([{}[\]:,])', r'\1"\2', b)
d = ast.literal_eval(c)
print(d)
# {'id': 'xyz-111', 'abc': {'item': [{'a': 'xyz', 'b': '123', 'c': {}, 'd': {'i': [{'ip': '0.0.0.0/0'}]}}]}}
a
removes all whitespace, replaces =
with :
, and replaces outer []
with {}
(the whitespace removal is a blunt instrument and would need to be more specifically targeted if the data contained strings with spaces that needed to be preserved)b
inserts "
after brackets, semicolons or commas not followed by any of those charactersc
inserts "
before brackets, semicolons or commas not preceeded by any of those charactersd
converts string to dict using ast.literal_eval
which is slightly more forgiving than json.loads
Upvotes: 3
Reputation: 20560
I agree with you that ordinarily json.loads()
would be the first choice for ingesting that. Where did that string come from?
It appears that layer_1 of some piece of code produced well-formed JSON, and then layer_2 stripped out quotes. Find layer_2, and tell it to stop doing that. Or, replicate layer_2, have your own code consume the original inputs and do a better job of processing it, so the quotes don't get lost.
There certainly is some structure remaining there,
between punctuation and line endings,
so in the Worst Case it would be worth your while to hack together an UnStrip
routine that puts back the missing quotes.
In the case of e.g. b=123
, it would not be so bad to emit 'b':'123'
,
as you can always post process,
where you recursively try to convert dictionary values to numbers,
with a try
/ except
to ignore the error if the value turns out
to look more like 'xyz'
than some integer.
Actually, the example of wrapping n = float(s)
within a try
is instructive.
There may be some ambiguity in any given line of the input,
with the option to try variant A or B as valid JSON.
It may be useful to attempt both, wrapped in a try
,
and return the first one that wins,
first one that evaluates as valid JSON.
Upvotes: 0