Reputation: 1543
I need to download data from certain database where data is stored in JSON. A sample of an object that I download looks like this :
data = [
{
'field1': 'value1',
'field2': 'value2',
'field3': null
},
{
'field1': 'value1',
'field2': 'value2',
'field3': 'value3'
}
]
Note that for field3
, what I receive in the data
JSON object is not a string null
, i.e. with single quotes wrapping it, but literally null
. This is just the way the database server returns a data request.
When I want to work on data
as a Python dictionary (or mashed into data frame, etc.), I get the error:
NameError: name 'null' is not defined
How would I convert all of these null
into a string null
, just like string value1
, value2
?
Further details for clarity:
This is how I obtain object data
(by making a requests
to a database:
import requests
import json
import pandas as pd
req = requests.get(...)
data = req.json()
print(json.dumps(data, indent=4, default=str))
df = pd.DataFrame(data)
The database that I call the requests.get()
from returns a JSON object that include null
as value. When I print the data
out as above, what is displayed is as above (first snippet of code).
print(type(data))
gives <class 'dict'>
.
the last line
df = pd.DataFrame(data)
gives the error above.
Upvotes: 0
Views: 4036
Reputation: 61654
It is important to be pedantic here to understand the problem. Sorry if it's a bit painful.
The database that I call the requests.get() from returns a JSON object that include null as value.
No, it does not. The database returns bytes, which represent text, which uses the JSON format to represent a JSON object.
"a JSON object" is, at best, a thing that only exists in a Javascript program. It is better to think of it as an abstraction. We say "object" because that's what Javascript calls its dictionaries (also called "associative arrays", "mappings" and a few other things, depending on what programming language you are using, or possibly what theoretical background you're leaning on).
When you parse a JSON document (i.e.: a sequence of bytes like what the database returned) in a Python program, normally you can expect to get a native Python data structure, implemented using the built-in data types. So the top-level JSON object will be represented with a Python dict
. JSON arrays will be represented with list
s. Numbers will be represented with int
and float
as appropriate. Strings will, unsurprisingly, be str
instances. true
and false
will become True
and False
, i.e. the two pre-defined Python boolean values.
And null
....
When I print the data out as above, what is displayed is as above (first snippet of code).
No, it is not. You will indeed see the display of a Python dictionary, as you show; but the way the value for the 'field3'
key is rendered, is not null
. Instead, it is None
.
That is because None
is a built-in Python object that every reasonable JSON parser (including the one built into requests
, and the standard library one) uses to represent a "null" value in JSON.
When I want to work on data as a Python dictionary (or mashed into data frame, etc.), I get the error:
You will not get this error if data
actually comes from parsing JSON. You will get it when you try to hard-code that Python representation of JSON into your program.
That is because the hard-coded representation should not say null
; it should say None
. Because that is the way that you write code to specify the value that is used in Python to represent a JSON null value.
How would I convert all of these null into a string null, just like string value1, value2?
You do not want to do this. Data types are important, and exist for a reason. By using the string 'null'
to represent a null value, you lose the ability to know whether it's really a null value, or an actual string with a lowercase n, lowercase u etc. This sort of thing has caused real problems for real people before.
What you want to do is write the literal None
in your program when you create this kind of data structure from scratch; and when you deal with this data - whether you get it from parsed JSON, a hand-written structure or any other process - look for these values by checking whether something is None
.
Just for completeness:
print(json.dumps(data, indent=4, default=str))
Doing this re-creates text in the JSON format which corresponds to the parsed JSON data, and displays it. So of course you will see null
if you do this, because that is what the JSON format uses. (You will also see double quotes for all strings, because the JSON format does not allow single quotes for strings.)
Upvotes: 3