khasis lampard
khasis lampard

Reputation: 33

Convert heavily nested json file into R/Python dataframe

I have found a numerous number of similar questions on stackoverflow, however, one issue remains unsolved to me. I have a heavily nested “.json” file I need to import and convert into R or Python data.frame to work with. Json file contains lists inside (usually empty but sometime contains data). Example of json's structure:

enter image description here

I use R's library jsonlite and Python's pandas.

# R
jsonlite::fromJSON(json_file, flatten = TRUE)
# or
jsonlite::read_json(json_file, simplifyVector = T)

# Python
with open(json_file.json, encoding = "utf-8") as f:
   data = json.load(f)

pd.json_normalize(data)

Generally, in both cases it work. The output looks like a normal data.frame, however, the problem is that some columns of a new data.frame contain an embedded lists (I am not sure about "embedded lists" whether it's correct and clear). Seems that both Pandas and jsonlite combined each list into single column, which is clearly seen in the screens below.

R

enter image description here

Python

enter image description here

As you might see some columns, such as wymagania.wymaganiaKonieczne.wyksztalcenia is nothing but a vector contains a combined/embedded list, i.e. content of a list has been combined into single column.

As a desired output I want to split each element of such lists as a single column of a data.frame. In other words, I want to obtain normal “in tidy sense” data.frame without any nested either data.frames and lists. Both R and Python codes are appreciated.

Minimum reproducible example:

[
{
"warunkiPracyIPlacy":{"miejscePracy":"abc","rodzajObowiazkow":"abc","zakresObowiazkow":"abc","rodzajZatrudnienia":"abc","kodRodzajuZatrudnienia":"abc","zmianowosc":"abc"},
"wymagania":{
    "wymaganiaKonieczne":{
        "zawody":[],
        "wyksztalcenia":["abc"],
        "wyksztalceniaSzczegoly":[{"kodPoziomuWyksztalcenia":"RPs002|WY","kodTypuWyksztalcenia":"abc"}],
        "jezyki":[],
        "jezykiSzczegoly":[],
        "uprawnienia":[]},
    "wymaganiaPozadane":{
        "zawody":[],
        "zawodySzczegoly":[],
        "staze":[]},
    "wymaganiaDodatkowe":{"zawody":[],"zawodySzczegoly":[]},
    "inneWymagania":"abc"
    },
"danePracodawcy":{"pracodawca":"abc","nip":"abc","regon":"abc","branza":null},
"pozostaleDane":{"identyfikatorOferty":"abc","ofertaZgloszonaPrzez":"abc","ofertaZgloszonaPrzezKodJednostki":"abc"},
"typOferty":"abc",
"typOfertyNaglowek":"abc",
"rodzajOferty":["DLA_ZAREJESTROWANYCH"],"staz":false,"link":false}
]

Upvotes: 1

Views: 1067

Answers (1)

bert wassink
bert wassink

Reputation: 369

This is an answer for Python. It is not very elegant, but I think it will do for your purpose.

I have called your example file nested_json.json

import json
import pandas as pd

json_file = "nested_json.json"
with open(json_file, encoding="utf-8") as f:
   data = json.load(f)

df = pd.json_normalize(data)

df_exploded = df.apply(lambda x: x.explode()).reset_index(drop=True)

# check based on first row whether its of type dict
columns_dict = df_exploded.columns[df_exploded.apply(lambda x: isinstance(x[0], dict))]

# append the splitted dict to the dataframe 
for col in columns_dict:
   df_splitted_dict = df_exploded[col].apply(pd.Series)
   df_exploded = pd.concat([df_exploded, df_splitted_dict], axis=1)

This leads to a rectangular dataframe

>>> df_exploded.T
                                                                                             0
typOferty                                                                                         abc
typOfertyNaglowek                                                                                 abc
rodzajOferty                                                                     DLA_ZAREJESTROWANYCH
staz                                                                                            False
link                                                                                            False
warunkiPracyIPlacy.miejscePracy                                                                   abc
warunkiPracyIPlacy.rodzajObowiazkow                                                               abc
warunkiPracyIPlacy.zakresObowiazkow                                                               abc
warunkiPracyIPlacy.rodzajZatrudnienia                                                             abc
warunkiPracyIPlacy.kodRodzajuZatrudnienia                                                         abc
warunkiPracyIPlacy.zmianowosc                                                                     abc
wymagania.wymaganiaKonieczne.zawody                                                               NaN
wymagania.wymaganiaKonieczne.wyksztalcenia                                                        abc
wymagania.wymaganiaKonieczne.wyksztalceniaSzcze...  {'kodPoziomuWyksztalcenia': 'RPs002|WY', 'kodT...
wymagania.wymaganiaKonieczne.jezyki                                                               NaN
wymagania.wymaganiaKonieczne.jezykiSzczegoly                                                      NaN
wymagania.wymaganiaKonieczne.uprawnienia                                                          NaN
wymagania.wymaganiaPozadane.zawody                                                                NaN
wymagania.wymaganiaPozadane.zawodySzczegoly                                                       NaN
wymagania.wymaganiaPozadane.staze                                                                 NaN
wymagania.wymaganiaDodatkowe.zawody                                                               NaN
wymagania.wymaganiaDodatkowe.zawodySzczegoly                                                      NaN
wymagania.inneWymagania                                                                           abc
danePracodawcy.pracodawca                                                                         abc
danePracodawcy.nip                                                                                abc
danePracodawcy.regon                                                                              abc
danePracodawcy.branza                                                                            None
pozostaleDane.identyfikatorOferty                                                                 abc
pozostaleDane.ofertaZgloszonaPrzez                                                                abc
pozostaleDane.ofertaZgloszonaPrzezKodJednostki                                                    abc
kodPoziomuWyksztalcenia                                                                     RPs002|WY
kodTypuWyksztalcenia                                                                              abc

Upvotes: 2

Related Questions