Reputation: 97
I'm reading a dataframe and trying to insert a list inside another list and then converting it to json file. I'm using python 3 and 0.25.3 version of pandas for it.
============================
Data that I'm reading:
id label id_customer label_customer part_number number_client
6 Sao Paulo CUST-99992 Brazil 7897 982
6 Sao Paulo CUST-99992 Brazil 888 12
92 Hong Kong CUST-88888 China 147 288
============================
Here is my code:
import pandas as pd
import json
data = pd.read_excel(path)
data["part_number"] = data["part_number"].apply(lambda x: str(x))
data["number_client"] = data["number_client"].apply(lambda x: str(x))
data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)
data["part_number"] = data["part_number"].apply(lambda x: {"part": x})
data["number_client"] = data["number_client"].apply(lambda x: {"client": x})
data["id_customer"] = data["id_customer"].apply(lambda x: {"id": x})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})
data["Customer"] = data.apply(lambda x: [{**x["id_customer"], **x["label_customer"]}],axis=1)
data["number"] = data.apply(lambda x: [{**x["part_number"], **x["number_client"]}], axis=1)
data = data[["id", "label", "Customer","number"]]
data.to_json(path)
=============================
What is expected:
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUS-99992",
"label": "Brazil",
"number": [{
"part": "7897",
"client": "892"
},
{
"part": "888",
"client": "12"
}]
}]
},
{
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUS-88888",
"label": "China",
"number": [{
"part": "147",
"client": "288"
}]
}]
}]
============================
What I'm getting:
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUS-99992",
"label": "Brazil"
}],
"number": [{
"part": "7897",
"client": "892"
}],
"number": [{
"part": "888",
"client": "12"
}]
}, {
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUS-88888",
"label": "China"
}],
"number": [{
"part": "147",
"client": "288"
}]
}]
======================
I tried to do the same thing using iterrows
function (and posted a question here 'Dataframe and conversion to JSON using Pandas'), but some people recommend me to try another way using another function. I know that is a stupid thing add number
object inside my data
, but I already tried of others way.
Could you help me?
Upvotes: 1
Views: 229
Reputation: 30971
Define the following reformatting function:
def reformat(row):
d1 = { 'part': str(row.part_number), 'client': str(row.number_client)}
d2 = { 'id': row.id_customer, 'label': row.label_customer, 'number': [d1] }
return { 'id': row.id, 'label': row.label, 'Customer': [d2] }
Then apply it the following way:
df.apply(reformat, axis=1).to_json('result.json', orient='records')
The result (reformatted for readability) is:
[ { "id":6,
"label":"Sao Paulo",
"Customer":[
{ "id":"CUST-99992",
"label":"Brazil",
"number":[{"part":"7897","client":"982"}]
}
]
},
{ "id":92,
"label":"Hong Kong",
"Customer":[
{ "id":"CUST-88888",
"label":"China",
"number":[{"part":"147","client":"288"}]
}
]
}
]
To cope with the variant of multiple rows for a single label / label_customer, take another approach:
Start from defining the following functions:
Get the content of number attribute:
def getNum(grp):
return eval(grp[['part', 'client']].to_json(orient='records'))
Note eval in this function. Otherwise the result would be a string (instead of list of dictionaries).
Get the content of Customer attribute:
def getCust(grp):
r0 = grp.iloc[0]
return { 'id': r0.id_customer, 'label': r0.label_customer, 'number': getNum(grp) }
Get the content of the whole JSON element for the current group:
def getGrp(grp):
r0 = grp.iloc[0]
return { 'id': r0.id, 'label': r0.label, 'Customer': getCust(grp) }
Then convert column types to string:
df.part_number = df.part_number.astype('str')
df.number_client = df.number_client.astype('str')
And to get the final result, run:
df.rename(columns={'part_number': 'part', 'number_client': 'client'})\
.groupby(['id', 'label', 'id_customer', 'label_customer'])\
.apply(getGrp).to_json(orient='values')
The above code:
Upvotes: 3