Merge nested key/values and nested list into a json

Question

I am trying to merge nested list of dictionary key/value into single key and list of values. I am loading csv file into data frame and from that I am trying to convert it into nested json. Please see below I have tried this. Should I be going this route to create json or does pandas have a native functionality that do this type of conversion?

Sample Data:

Subject,StudentName,Category
ENGLISH,Jane,
ENGLISH,,A
MATH,Matt,B
MATH,Newman,AA
MATH,,B
MATH,Dylan,A
ENGLISH,Noah,
ENGLISH,,C

Tried this:

df1 = pd.read_csv('../data/file.csv')
json_doc = defaultdict(list)
for _id in df1.T:
    data = df1.T[_id]
    key = data.Subject
    values = {'StudentName': data.StudentName,'Category':data.Category}
    json_doc[key].append(values)

new_d = json.dumps(json_doc, indent=4)

{k: int(v) for k, v in new_d} # error: ValueError: not enough values to unpack (expected 2, got 1)

and I get this from the code above:

{
    "ENGLISH": [
        {
            "StudentName": "Jane",
            "Category": NaN
        },
        {
            "StudentName": NaN,
            "Category": "A"
        },
        {
            "StudentName": "Noah",
            "Category": NaN
        },
        {
            "StudentName": NaN,
            "Category": "C"
        }
    ],
    "MATH": [
        {
            "StudentName": "Matt",
            "Category": "B"
        },
        {
            "StudentName": "Newman",
            "Category": "AA"
        },
        {
            "StudentName": NaN,
            "Category": "B"
        },
        {
            "StudentName": "Dylan",
            "Category": "A"
        }
    ]
}

How do I merge key/value to get it look like this one?

{
    "ENGLISH": [
        {
            "StudentName": ["Jane","Noah"],
            "Category": ["A","C"]
        }
    ],
    "MATH": [
        {
            "StudentName": ["Matt","Newman","Dylan"]
            "Category": ["B","AA","A"]
        }
    ]
}

Marat · Accepted Answer

It is not entirely clear to me if it is safe to ignore missing values, but here is my one-liner:

df.groupby('Subject').agg(lambda g: list(g.dropna())).to_dict(orient='index')

Default methods (to_json, to_dict) do not have a suitable orient option. So, we have to do some work by hands by grouping by index and then converting column data to a list. Then, .to_dict(orient='index') will do what you want (replace with to_json if you want a string instead of an object).

Note: Subject here is expected to be a column, not an index.

Merge nested key/values and nested list into a json

Answers (1)

Related Questions