lispguy
lispguy

Reputation: 115

How to correct leading zeroes in JSON with python

I have a wrongly-formatted JSON file where I have numbers with leading zeroes.

p = """[
{
    "name": "Alice",
    "RegisterNumber": 911100020001
},
{
    "name": "Bob",
    "RegisterNumber": 000111110300
}
]"""
arc = json.loads(p)

I get this error.

JSONDecodeError: Expecting ',' delimiter: line 8 column 24 (char 107)

Here's what is on char 107:

print(p[107])
#0

The problem is: this is the data I have. Here I am only showing two examples, but my file has millions of lines to be parsed, I need a script. At the end of the day, I need this string:

"""[
{
    "name": "Alice",
    "RegisterNumber": "911100020001"
},
{
    "name": "Bob",
    "RegisterNumber": "000111110300"
}
]"""

How can I do it?

Upvotes: 1

Views: 954

Answers (3)

holdenweb
holdenweb

Reputation: 37023

Since the problem is the leading zeroes, tne easy way to fix the data would be to split it into lines and fix any lines that exhibit the problem. It's cheap and nasty, but this seems to work.

data = """[
{
    "name": "Alice",
    "RegisterNumber": 911100020001
},
{
    "name": "Bob",
    "RegisterNumber": 000111110300
}
]"""
result = []
for line in data.splitlines():
    if ': 0' in line:
        while ": 0" in line:
            line = line.replace(': 0', ': ')
        result.append(line.replace(': ', ': "')+'"')
    else:
        result.append(line)
data = "".join(result)

arc = json.loads(data)
print(arc)

Upvotes: 1

Loïc Faure-Lacroix
Loïc Faure-Lacroix

Reputation: 13600

This probably won't be pretty but you could probably fix this using a regex.

import re
p = "..."
sub = re.sub(r'"RegisterNumber":\W([0-9]+)', r'"RegisterNumber": "\1"', p)
json.loads(sub)

This will match all the case where you have the RegisterNumber followed by numbers.

Upvotes: 2

Dr. V
Dr. V

Reputation: 1914

Read the file (best line by line) and replace all the values with their string representation. You can use regular expressions for that (remodule). Then save and later parse the valid json.

If it fits into memory, you don't need to save the file of course, but just loads the then valid json string.

Here is a simple version:

import json

p = """[
{
    "name": "Alice",
    "RegisterNumber": 911100020001
},
{
    "name": "Bob",
    "RegisterNumber": 000111110300
}
]"""

from re import sub
p = sub(r"(\d{12})", "\"\\1\"", p)

arc = json.loads(p)
print(arc[1])

Upvotes: 5

Related Questions