Lucas
Lucas

Reputation: 97

How to append a list in Pandas?

I'm reading a dataframe and trying to insert a list inside another list and then converting it to json file. I'm using python 3 and 0.25.3 version of pandas for it.

============================

Data that I'm reading:

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

============================

Here is my code:

import pandas as pd 
import json

data = pd.read_excel(path)

data["part_number"] = data["part_number"].apply(lambda x: str(x))
data["number_client"]  = data["number_client"].apply(lambda x: str(x))

data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)

data["part_number"] = data["part_number"].apply(lambda x: {"part": x})
data["number_client"] = data["number_client"].apply(lambda x: {"client": x})

data["id_customer"] = data["id_customer"].apply(lambda x: {"id": x})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})

data["Customer"] = data.apply(lambda x: [{**x["id_customer"], **x["label_customer"]}],axis=1)
data["number"] = data.apply(lambda x: [{**x["part_number"], **x["number_client"]}], axis=1)

data = data[["id", "label", "Customer","number"]]

data.to_json(path)

=============================

What is expected:

[{
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUS-99992",
        "label": "Brazil",
        "number": [{
        "part": "7897",
        "client": "892"
    },
    {
       "part": "888",
       "client": "12"
    }]
    }]  
}, 
{
    "id": 92,
    "label": "Hong Kong",
    "Customer": [{
        "id": "CUS-88888",
        "label": "China",
        "number": [{
        "part": "147",
        "client": "288"
    }]
    }] 
}]

============================

What I'm getting:

[{
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUS-99992",
        "label": "Brazil"
    }],
    "number": [{
        "part": "7897",
        "client": "892"
    }],
    "number": [{
        "part": "888",
        "client": "12"
    }]
}, {
    "id": 92,
    "label": "Hong Kong",
    "Customer": [{
        "id": "CUS-88888",
        "label": "China"
    }],
    "number": [{
        "part": "147",
        "client": "288"
    }]
}]

======================

I tried to do the same thing using iterrows function (and posted a question here 'Dataframe and conversion to JSON using Pandas'), but some people recommend me to try another way using another function. I know that is a stupid thing add number object inside my data, but I already tried of others way.

Could you help me?

Upvotes: 1

Views: 229

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Define the following reformatting function:

def reformat(row):
    d1 = { 'part': str(row.part_number), 'client': str(row.number_client)}
    d2 = { 'id': row.id_customer, 'label': row.label_customer, 'number': [d1] }
    return { 'id': row.id, 'label': row.label, 'Customer': [d2] }

Then apply it the following way:

df.apply(reformat, axis=1).to_json('result.json', orient='records')

The result (reformatted for readability) is:

[ { "id":6,
    "label":"Sao Paulo",
    "Customer":[
      { "id":"CUST-99992",
        "label":"Brazil",
        "number":[{"part":"7897","client":"982"}]
      }
    ]
  },
  { "id":92,
    "label":"Hong Kong",
    "Customer":[
      { "id":"CUST-88888",
        "label":"China",
        "number":[{"part":"147","client":"288"}]
      }
    ]
  }
]

Edit following the comment

To cope with the variant of multiple rows for a single label / label_customer, take another approach:

Start from defining the following functions:

  1. Get the content of number attribute:

    def getNum(grp):
        return eval(grp[['part', 'client']].to_json(orient='records'))
    

    Note eval in this function. Otherwise the result would be a string (instead of list of dictionaries).

  2. Get the content of Customer attribute:

    def getCust(grp):
        r0 = grp.iloc[0]
        return { 'id': r0.id_customer, 'label': r0.label_customer, 'number': getNum(grp) }
    
  3. Get the content of the whole JSON element for the current group:

    def getGrp(grp):
        r0 = grp.iloc[0]
        return { 'id': r0.id, 'label': r0.label, 'Customer': getCust(grp) }
    

Then convert column types to string:

df.part_number = df.part_number.astype('str')
df.number_client = df.number_client.astype('str')

And to get the final result, run:

df.rename(columns={'part_number': 'part', 'number_client': 'client'})\
    .groupby(['id', 'label', 'id_customer', 'label_customer'])\
    .apply(getGrp).to_json(orient='values')

The above code:

  • Renames part_number and number_client to part and client, respectively. This change is needed to generate proper element names by getNum.
  • Groups the DataFrame (as in your code).
  • Applies getGrp function to each group. The result is a Series of JSON elements.
  • And finally to_json converts this Series to a list of JSON elements.

Upvotes: 3

Related Questions