Handling Invalid Json

Question

I am getting an ill-formatted json , as the key "text" can have users comments, so I need to fix the issue with Json (with double quotes)

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test, "." test " 1234"","created":"2019-01-09"}]}

Tried below from another thread but not able to make it work.

import json, re

while True:
    try:
        result = json.loads(test.json)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'$char (\d+)$', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'"' + s[closg+1:]
print result

Traceback (most recent call last):
  File "test.py", line 10, in 
    unexp = int(re.findall(r'$char (\d+)$', str(e))[0])
IndexError: list index out of range

Expected Result:(check text: key data with escaped double quotes)

Or we can remove all double quotes after "text": & before "created" & then enclose the value in "text": key with a starting & ending " which would solve my issue

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test "." test " 1234"","created":"2019-01-09"}]}

or

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test . test 1234","created":"2019-01-09"}]}

C.Nivs · Accepted Answer

You just need to edit that one line, so you can use a regex to match it, edit the value, and then join it back with the rest of the json string for it to be parsed

import re
import json

json_str = '''{
  "test": [
    {
      "id": "1234",
      "user": {
        "id": "1234"
      },
      "text": "test "." test " 1234"",
      "created": "2019-01-09"
    }
  ]
}'''

lines = []
# match the text key
text_line = re.compile('^\s+"text"')

for line in json_str.split('
'):
    # if a match happens, this will execute and fix the "text" line
    if re.match(text_line, line):
        k, v = line.split(':')
        # the slice here is so that I don't escape the wrapping
        # double quotes, which are the first and last chars of v
        v = '"%s",' %  v.strip()[1:-1].replace('"', '\"')
        line = '%s: %s' % (k, v)
    # otherwise, carry on
    lines.append(line)

print('
'.join(lines))

{
  "test": [
    {
      "id": "1234",
      "user": {
        "id": "1234"
      },
      "text": "test "." test " 1234""",
      "created": "2019-01-09"
    }
  ]
}

# Now you can parse it with json.loads
json.loads('
'.join(lines))

{'test': [{'id': '1234', 'user': {'id': '1234'}, 'text': 'test "." test " 1234""', 'created': '2019-01-09'}]}

EDIT: OP has indicated json is single line

There is some optimization that can be done, but you can find all of the keys in your json using re, and then parse it using a similar fashion as before:

import re
import json

# Now all one line
s = '''{"test":[{"id":"1234","user":{"id":"1234"},"text":"test, "." test " 1234"","created":"2019-01-09"}]}'''

# find our keys which will serve as our placeholders
keys = re.findall('"\w+"\:', s))

# ['"test":', '"id":', '"user":', '"id":', '"text":', '"created":']

# now we can find the indices for those keys to mark start
# and finish locations to extract the value
start, finish = s.index(keys[-2]), s.index(keys[-1])

k, v = s[start:finish].split(':')
# replace v as before
v = '"%s",' %  v.strip()[1:-1].replace('"', '\"')
# '"test, \".\" test \" 1234\"\"",'

# replace string since it's immutable
s = s[:start] + '%s: %s' % (k, v) + s[finish:]

json.loads(s)
# {'test': [{'id': '1234', 'user': {'id': '1234'}, 'text': 'test, "." test " 1234""', 'created': '2019-01-09'}]}

As a note, this works for this particular use case, I can try to work out a more general approach, but this will at least get you off the ground

Handling Invalid Json

Answers (1)

EDIT: OP has indicated json is single line

Related Questions