Tristan Tran
Tristan Tran

Reputation: 1543

null as value in dictionary or JSON object

I need to download data from certain database where data is stored in JSON. A sample of an object that I download looks like this :

data =  [ 
              {
                 'field1': 'value1',
                 'field2': 'value2',
                 'field3': null
              },          
               {
                 'field1': 'value1',
                 'field2': 'value2',
                 'field3': 'value3'
              }

       ]

Note that for field3, what I receive in the data JSON object is not a string null, i.e. with single quotes wrapping it, but literally null. This is just the way the database server returns a data request.

When I want to work on data as a Python dictionary (or mashed into data frame, etc.), I get the error:

NameError: name 'null' is not defined

How would I convert all of these null into a string null, just like string value1, value2?

Further details for clarity:

This is how I obtain object data (by making a requests to a database:

import requests
import json
import pandas as pd 

req = requests.get(...)
data = req.json()

print(json.dumps(data, indent=4, default=str))

df = pd.DataFrame(data)

The database that I call the requests.get() from returns a JSON object that include null as value. When I print the data out as above, what is displayed is as above (first snippet of code).

print(type(data))

gives <class 'dict'>.

the last line df = pd.DataFrame(data)

gives the error above.

Upvotes: 0

Views: 4036

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61654

It is important to be pedantic here to understand the problem. Sorry if it's a bit painful.

The database that I call the requests.get() from returns a JSON object that include null as value.

No, it does not. The database returns bytes, which represent text, which uses the JSON format to represent a JSON object.

"a JSON object" is, at best, a thing that only exists in a Javascript program. It is better to think of it as an abstraction. We say "object" because that's what Javascript calls its dictionaries (also called "associative arrays", "mappings" and a few other things, depending on what programming language you are using, or possibly what theoretical background you're leaning on).

When you parse a JSON document (i.e.: a sequence of bytes like what the database returned) in a Python program, normally you can expect to get a native Python data structure, implemented using the built-in data types. So the top-level JSON object will be represented with a Python dict. JSON arrays will be represented with lists. Numbers will be represented with int and float as appropriate. Strings will, unsurprisingly, be str instances. true and false will become True and False, i.e. the two pre-defined Python boolean values.

And null....

When I print the data out as above, what is displayed is as above (first snippet of code).

No, it is not. You will indeed see the display of a Python dictionary, as you show; but the way the value for the 'field3' key is rendered, is not null. Instead, it is None.

That is because None is a built-in Python object that every reasonable JSON parser (including the one built into requests, and the standard library one) uses to represent a "null" value in JSON.

When I want to work on data as a Python dictionary (or mashed into data frame, etc.), I get the error:

You will not get this error if data actually comes from parsing JSON. You will get it when you try to hard-code that Python representation of JSON into your program.

That is because the hard-coded representation should not say null; it should say None. Because that is the way that you write code to specify the value that is used in Python to represent a JSON null value.

How would I convert all of these null into a string null, just like string value1, value2?

You do not want to do this. Data types are important, and exist for a reason. By using the string 'null' to represent a null value, you lose the ability to know whether it's really a null value, or an actual string with a lowercase n, lowercase u etc. This sort of thing has caused real problems for real people before.

What you want to do is write the literal None in your program when you create this kind of data structure from scratch; and when you deal with this data - whether you get it from parsed JSON, a hand-written structure or any other process - look for these values by checking whether something is None.


Just for completeness:

print(json.dumps(data, indent=4, default=str))

Doing this re-creates text in the JSON format which corresponds to the parsed JSON data, and displays it. So of course you will see null if you do this, because that is what the JSON format uses. (You will also see double quotes for all strings, because the JSON format does not allow single quotes for strings.)

Upvotes: 3

Related Questions