xavier_2639
xavier_2639

Reputation: 1

python3 - json.loads for a string that contains " in a value

I'm trying to transform a string that contains a dict to a dict object using json. But in the data contains a " example

string = '{"key1":"my"value","key2":"my"value2"}'
js = json.loads(s,strict=False)

it outputs json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 13 (char 12) as " is a delimiter and there is too much of it

What is the best way to achieve my goal ?

The solution I have found is to perform several .replace on the string to replace legit " by a pattern until only illgal " remains then replace back the pattern by the legit " After that I can use json.loads and then replace the remaining pattern by the illegal " But there must be another way

ex :

string = '{"key1":"my"value","key2":"my"value2"}'
string = string.replace('{"','__pattern_1')
string = string.replace('}"','__pattern_2')
...
...
string = string.replace('"','__pattern_42')
string = string.replace('__pattern_1','{"')
string = string.replace('__pattern_2','}"')
...
...
js = json.loads(s,strict=False)

Upvotes: 0

Views: 776

Answers (3)

jrudolf
jrudolf

Reputation: 56

Problem is, that the string contains invalid json format.

String '{"key1": "my"value", "key2": "my"value2"}': value of key1 ends with "my" and additional characters value" are against the format.

You can use character escaping, valid json would look like:

{"key1": "my\"value", "key2": "my\"value2"}.

Since you are defining it as value you would then need to escape the escape characters:

string = '{"key1": "my\\"value", "key2": "my\\"value2"}'

There is a lot of educative material online on character escaping. I recommend to check it out if something is not clear


Edit: If you insist on fixing the string in code (which I don't recommend, see comment) you can do something like this:

import re
import json
string = '{"key1":"my"value","key2":"my"value2"}'

# finds contents of keys and values, assuming that the key the real key/value ending double quotes 
# followed by one of following characters: ,}:]
m = re.finditer(r'"([^:]+?)"(?:[,}:\]])', string)

new_string = string

for i in reversed(list(m)):
    ss, se = i.span(1)  # first group holds the content
    # escape double backslashes in the content and add all back together
    # note that this is not effective. Bigger amounts of replacements would require other approach of concatanation 
    new_string = new_string[:ss] + new_string[ss:se].replace('"', '\\"') + new_string[se:]

json.loads(new_string)

This assumes that the real ending double quotes are followed by one of ,:}]. In other cases this won't work

Upvotes: 0

Rohan Chavan
Rohan Chavan

Reputation: 11

This should work. What I am doing here is to simply replace all the expected double quotes with something else and then remove the unwanted double quotes. and then convert it back.

import re
import json

def fix_json_string(st):
    st = re.sub(r'","',"!!",st)
    st = re.sub(r'":"',"--",st)
    st = re.sub(r'{"',"{{",st)
    st = re.sub(r'"}',"}}",st)
    st = st.replace('"','')
    st = re.sub(r'}}','"}',st)
    st = re.sub(r'{{','{"',st)
    st = re.sub(r'--','":"',st)
    st = re.sub(r'!!','","',st)
    return st

broken_string = '{"key1":"my"value","key2":"my"value2"}'
fixed_string = fix_json_string(broken_string)
print(fixed_string)
js = json.dumps(eval(fixed_string))
print(js)


Output -
{"key1":"myvalue","key2":"myvalue2"} # str
{"key1": "myvalue", "key2": "myvalue2"} # converted to json

Upvotes: 1

Cham
Cham

Reputation: 116

The variable string is not a valid JSON string. The correct string should be:

string = '{"key1":"my\\"value","key2":"my\\"value2"}'

Upvotes: 0

Related Questions