Reputation: 709
I want to create my dataframe which looks like this:
employeeId firstName lastName emailAddress isDependent employeeIdTypeCode entityCode sourceCode roleCode
0 E123456 Andrew Hoover [email protected] False 001 AE AHR EMPLR
0 102939485 Andrew Hoover [email protected] False 002 AE AHR EMPLR
2 E123458 Celeste Riddick [email protected] True 001 AE AHR EMPLR
2 354852739 Celeste Riddick [email protected] True 002 AE AHR EMPLR
1 E123457 Curt Austin [email protected] True 001 AE AHR EMPLR
1 675849302 Curt Austin [email protected] True 002 AE AHR EMPLR
3 E123459 Hazel Tooley [email protected] False 001 AE AHR EMPLR
3 937463528 Hazel Tooley [email protected] False 002 AE AHR EMPLR
And for each row, I want to convert it into a nested JSON format. So I want my JSON to look something like this for each individual, since I want to iterate over the dataframe and post it to an api.
{
"individualInfo":
{
"individualIdentifier":[
{
"identityTypeCode":"001",
"identifierValue":"E123456",
"profileInfo":{
"firstName":"Andrew",
"lastName":"Hoover",
"emailAddress":"[email protected]"
}
},
{
"identityTypeCode":"002",
"identifierValue":"102939485",
"profileInfo":{
"firstName":"Andrew",
"lastName":"Hoover",
"emailAddress":"[email protected]"
}
}
],
"entityCode":"AE",
"sourceCode":"AHR",
"roleCode":"EMPLR"
"isDependent":False
}
}
The important thing here is that I want my JSON to be generated agnostic of the Id
columns coming on the dataframe. So, if there is, for example, another ID coming on the dataframe, then I want that ID to have another dictionary object with the same profile info. So each profile can have any number of Id
with it.
Code I could do:
j = (result.groupby(['identifierValue','identityTypeCode'], as_index=False).apply(lambda x: x[['firstName','lastName','emailAddress']].to_dict('r')).reset_index().rename(columns={0:'ProfileInfo'}).to_json(orient='records'))
Would it be possible to achieve something like this dynmically in pandas? Thank you so much for the help!
Few of other questions that I could find for nesting:
Convert Pandas Dataframe to nested JSON
None of these questions are helping me out since I want each index of my dataframe to be converted into an individual JSON payload, as each individual is going to an api service I have for the purpose of posting the data to the database.
Upvotes: 4
Views: 1785
Reputation: 1672
Perhaps you can iterate over a group by, then do another iteration for each row within that group. Thus, creating a nested dictionary structure:
This explains one way going through with it:
import pandas as pd
df = pd.DataFrame({"entityCode":[1,1,3,3],"sourceCode":[4,4,6,6],'identityTypeCode':[7,8,9,10]})
results = []
for i, sub_df in df.groupby(["entityCode","sourceCode"]):
entityCode, sourceCode = i
d = {}
d["individualInfo"] = {"entityCode":entityCode, "sourceCode":sourceCode}
sub_result = []
for _, row in sub_df[["identityTypeCode"]].drop_duplicates().iterrows():
sub_result.append(row.to_dict())
d["individualIdentifier"] = sub_result
results.append(d)
results
which returns something like this:
[{'individualInfo': {'entityCode': 1, 'sourceCode': 4},
'individualIdentifier': [{'identityTypeCode': 7}, {'identityTypeCode': 8}]},
{'individualInfo': {'entityCode': 3, 'sourceCode': 6},
'individualIdentifier': [{'identityTypeCode': 9}, {'identityTypeCode': 10}]}]
afterwards, you can convert the dictionary to json.
Upvotes: 0
Reputation: 939
Not really a Pandas solution but kinds works:
Starts from your result
dataframe
from collections import defaultdict
import json
result = 'your data frame'
dicted = defaultdict(dict)
for r in result.values.tolist():
identifierValue, firstName, lastName, emailAddress,isDependent,\
identityTypeCode, entityCode, sourceCode,roleCode = r
tupled_criteria = (firstName,lastName,emailAddress)
if dicted[tupled_criteria].get("individualInfo"):
pass
else:
dicted[tupled_criteria]["individualInfo"] = {}
dicted[tupled_criteria]["individualInfo"]['entityCode'] = entityCode
dicted[tupled_criteria]["individualInfo"]['soruceCode'] = sourceCode
dicted[tupled_criteria]["individualInfo"]['roleCode'] = roleCode
dicted[tupled_criteria]["individualInfo"]['isDependent'] = isDependent
if dicted[tupled_criteria]["individualInfo"].get("individualIdentifier"):
pass
else:
dicted[tupled_criteria]["individualInfo"]["individualIdentifier"] = []
dicted[tupled_criteria]["individualInfo"]["individualIdentifier"]\
.append({"identityTypeCode":identityTypeCode,
"identifierValue":identifierValue,
"profileInfo":{
"firstName":firstName,
"lastName":lastName,
"emailAddress":emailAddress}})
for k,v in dicted.items():
print(k,'\n',json.dumps(v),'\n\n')
Upvotes: 0
Reputation: 1604
It sounds like the most sensible way to pull this off is:
info_dict = df.set_index(['identifierValue', 'identifierValue']).to_dict('index')
Then every time you get to profileInfo
in your JSON, you can reference the info_dict
above with the appropriate ('identifierValue', 'identifierValue')` key pair
I'm confused about what your desired formatting is, but this is a start.
Upvotes: 5