aldy_abe
aldy_abe

Reputation: 117

Python - Flattening the nested JSON file

I am working with API data that I pull from the Nobel Prize API.
Particularly the following endpoints (Nobel API version 2): https://api.nobelprize.org/2.0/laureates
I have successfully pulled them using a simple Python Script that I build.

Below are some of the example of the JSON file contents.
But first, allow me to apologize in advance about the length of the snippet.
I couldn't find a better way to present it.

Type - 1

{
    "laureates": [
        {
            "birth": {
                "date": "1943-00-00",
                "place": {
                    "city": {
                        "en": "Montclair, NJ",
                        "no": "Montclair, NJ",
                        "se": "Montclair, NJ"
                    },
                    "cityNow": {
                        "en": "Montclair, NJ",
                        "no": "Montclair, NJ",
                        "se": "Montclair, NJ"
                    },
                    "continent": {
                        "en": "North America"
                    },
                    "country": {
                        "en": "USA",
                        "no": "USA",
                        "se": "USA"
                    },
                    "countryNow": {
                        "en": "USA",
                        "no": "USA",
                        "se": "USA"
                    },
                    "locationString": {
                        "en": "Montclair, NJ, USA",
                        "no": "Montclair, NJ, USA",
                        "se": "Montclair, NJ, USA"
                    }
                }
            },
            "familyName": {
                "en": "Spence",
                "se": "Spence"
            },
            "fullName": {
                "en": "A. Michael Spence",
                "se": "A. Michael Spence"
            },
            "gender": "male",
            "givenName": {
                "en": "A. Michael",
                "se": "A. Michael"
            },
            "id": "745",
            "knownName": {
                "en": "A. Michael Spence",
                "se": "A. Michael Spence"
            },
            "links": {
                "action": "Get",
                "href": "http://masterdataapi.nobelprize.org/2/laureate/745",
                "rel": "laureate",
                "types": "application/json"
            },
            "nobelPrizes": [
                {
                    "affiliations": [
                        {
                            "city": {
                                "en": "Stanford, CA",
                                "no": "Stanford, CA",
                                "se": "Stanford, CA"
                            },
                            "cityNow": {
                                "en": "Stanford, CA",
                                "no": "Stanford, CA",
                                "se": "Stanford, CA"
                            },
                            "country": {
                                "en": "USA",
                                "no": "USA",
                                "se": "USA"
                            },
                            "countryNow": {
                                "en": "USA",
                                "no": "USA",
                                "se": "USA"
                            },
                            "locationString": {
                                "en": "Stanford, CA, USA",
                                "no": "Stanford, CA, USA",
                                "se": "Stanford, CA, USA"
                            },
                            "name": {
                                "en": "Stanford University",
                                "no": "Stanford University",
                                "se": "Stanford University"
                            },
                            "nameNow": {
                                "en": "Stanford University"
                            }
                        }
                    ],
                    "awardYear": "2001",
                    "category": {
                        "en": "Economic Sciences",
                        "no": "\u00d8konomi",
                        "se": "Ekonomi"
                    },
                    "categoryFullName": {
                        "en": "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel",
                        "no": "Sveriges Riksbanks pris i \u00f8konomisk vitenskap til minne om Alfred Nobel",
                        "se": "Sveriges Riksbanks pris i ekonomisk vetenskap till Alfred Nobels minne"
                    },
                    "dateAwarded": "2001-10-10",
                    "links": {
                        "action": "Get",
                        "href": "https://masterdataapi.nobelprize.org/2/nobelPrize/eco/2001",
                        "rel": "nobelPrize",
                        "types": "application/json"
                    },
                    "motivation": {
                        "en": "for their analyses of markets with asymmetric information",
                        "se": "f\u00f6r deras analys av marknader med assymetrisk informations"
                    },
                    "portion": "1/3",
                    "prizeAmount": 10000000,
                    "prizeAmountAdjusted": 12295082,
                    "prizeStatus": "received",
                    "sortOrder": "2"
                }
            ]
        }}

The type-2 view of the content has a new "key" called "death".
It located between the "birth" key and "familyName" key. Below is an example of it.

"death": {
                "date": "2009-09-08",
                "place": {
                    "city": {
                        "en": "Copenhagen",
                        "no": "K\u00f8benhavn",
                        "se": "K\u00f6penhamn"
                    },
                    "cityNow": {
                        "en": "Copenhagen",
                        "no": "K\u00f8benhavn",
                        "se": "K\u00f6penhamn"
                    },
                    "continent": {
                        "en": "Europe"
                    },
                    "country": {
                        "en": "Denmark",
                        "no": "Danmark",
                        "se": "Danmark"
                    },
                    "countryNow": {
                        "en": "Denmark",
                        "no": "Danmark",
                        "se": "Danmark"
                    },
                    "locationString": {
                        "en": "Copenhagen, Denmark",
                        "no": "K\u00f8benhavn, Danmark",
                        "se": "K\u00f6penhamn, Danmark"
                    }
                }
            }

I am quite a noob when it comes to JSON.
However, from my understanding, the above JSON file is heavily nested so it will require some form of flattening before we could store it in Pandas DataFrame.

This is where I am stuck and I have searched for similar topics on Stack Overflow.
However, I could not get my head around it. It seems that I could use the json_normalize function in Pandas. But, I am not sure what to put on the following parameters record_path and meta, given the structure of the JSON file.

Thus, would Stack Overflow community kindly help me progress in my learning to use JSON file?
Many thanks for your time!

(In case I was not clear enough,
I want to flatten my current JSON file into the flattened version so I can use it as a Pandas DataFrame.
I apologise for this!)

Upvotes: 0

Views: 183

Answers (1)

Lucaash
Lucaash

Reputation: 131

assuming your first snipped is stored in a variable called data, all you need to do is:

import pandas

laureates: pandas.core.frame.DataFrame = pandas.json_normalize(data['laureates'])

and it will give you a DataFrame. The issue is that each laureate has list of nobel prizes, so you might want to extract that into a separate DataFrame (otherwise you will end up with a DF with a list of dicts in nobelPrizes column)

Upvotes: 2

Related Questions