How to append a list in Pandas?

Question

I'm reading a dataframe and trying to insert a list inside another list and then converting it to json file. I'm using python 3 and 0.25.3 version of pandas for it.

============================

Data that I'm reading:

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

============================

Here is my code:

import pandas as pd 
import json

data = pd.read_excel(path)

data["part_number"] = data["part_number"].apply(lambda x: str(x))
data["number_client"]  = data["number_client"].apply(lambda x: str(x))

data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)

data["part_number"] = data["part_number"].apply(lambda x: {"part": x})
data["number_client"] = data["number_client"].apply(lambda x: {"client": x})

data["id_customer"] = data["id_customer"].apply(lambda x: {"id": x})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})

data["Customer"] = data.apply(lambda x: [{**x["id_customer"], **x["label_customer"]}],axis=1)
data["number"] = data.apply(lambda x: [{**x["part_number"], **x["number_client"]}], axis=1)

data = data[["id", "label", "Customer","number"]]

data.to_json(path)

=============================

What is expected:

[{
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUS-99992",
        "label": "Brazil",
        "number": [{
        "part": "7897",
        "client": "892"
    },
    {
       "part": "888",
       "client": "12"
    }]
    }]  
}, 
{
    "id": 92,
    "label": "Hong Kong",
    "Customer": [{
        "id": "CUS-88888",
        "label": "China",
        "number": [{
        "part": "147",
        "client": "288"
    }]
    }] 
}]

============================

What I'm getting:

[{
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUS-99992",
        "label": "Brazil"
    }],
    "number": [{
        "part": "7897",
        "client": "892"
    }],
    "number": [{
        "part": "888",
        "client": "12"
    }]
}, {
    "id": 92,
    "label": "Hong Kong",
    "Customer": [{
        "id": "CUS-88888",
        "label": "China"
    }],
    "number": [{
        "part": "147",
        "client": "288"
    }]
}]

======================

I tried to do the same thing using iterrows function (and posted a question here 'Dataframe and conversion to JSON using Pandas'), but some people recommend me to try another way using another function. I know that is a stupid thing add number object inside my data, but I already tried of others way.

Could you help me?

Valdi_Bo · Accepted Answer

Define the following reformatting function:

def reformat(row):
    d1 = { 'part': str(row.part_number), 'client': str(row.number_client)}
    d2 = { 'id': row.id_customer, 'label': row.label_customer, 'number': [d1] }
    return { 'id': row.id, 'label': row.label, 'Customer': [d2] }

Then apply it the following way:

df.apply(reformat, axis=1).to_json('result.json', orient='records')

The result (reformatted for readability) is:

[ { "id":6,
    "label":"Sao Paulo",
    "Customer":[
      { "id":"CUST-99992",
        "label":"Brazil",
        "number":[{"part":"7897","client":"982"}]
      }
    ]
  },
  { "id":92,
    "label":"Hong Kong",
    "Customer":[
      { "id":"CUST-88888",
        "label":"China",
        "number":[{"part":"147","client":"288"}]
      }
    ]
  }
]

Edit following the comment

To cope with the variant of multiple rows for a single label / label_customer, take another approach:

Start from defining the following functions:

Get the content of number attribute:
```
def getNum(grp):
    return eval(grp[['part', 'client']].to_json(orient='records'))
```
Note eval in this function. Otherwise the result would be a string (instead of list of dictionaries).

Get the content of Customer attribute:

def getCust(grp):
    r0 = grp.iloc[0]
    return { 'id': r0.id_customer, 'label': r0.label_customer, 'number': getNum(grp) }

Get the content of the whole JSON element for the current group:

def getGrp(grp):
    r0 = grp.iloc[0]
    return { 'id': r0.id, 'label': r0.label, 'Customer': getCust(grp) }

Then convert column types to string:

df.part_number = df.part_number.astype('str')
df.number_client = df.number_client.astype('str')

And to get the final result, run:

df.rename(columns={'part_number': 'part', 'number_client': 'client'})\
    .groupby(['id', 'label', 'id_customer', 'label_customer'])\
    .apply(getGrp).to_json(orient='values')

The above code:

Renames part_number and number_client to part and client, respectively. This change is needed to generate proper element names by getNum.
Groups the DataFrame (as in your code).
Applies getGrp function to each group. The result is a Series of JSON elements.
And finally to_json converts this Series to a list of JSON elements.

How to append a list in Pandas?

Answers (1)

Edit following the comment

Related Questions