Reputation: 592
The data looks like this :
The expected Json fomat is like this
{
"DataExtractName": "SalesDataExtract",
"BusinessName" : {
"InvoiceDate": {
"SourceSystem": {
"MYSQL" : "Invc_Dt",
"CSV" : "Invc_Date"
},
"DataType": {
"MYSQL" : "varchar",
"CSV" : "string"
}
},
"Description": {
"SourceSystem": {
"MYSQL" : "Prod_Desc",
"CSV" : "Prod_Descr"
},
"DataType": {
"MYSQL" : "varchar",
"CSV" : "string"
}
}
}
},
{
"DataExtractName": "DateDataExtract",
"BusinessName" : {
"InvoiceDate": {
"SourceSystem": {
"MYSQL" : "Date"
},
"DataType": {
"MYSQL" : "varchar"
}
}
}
}
How do i achieve this using python dataframes? Or do i need to write some script to make the data like this?
Note
I've tried using -
Upvotes: 1
Views: 577
Reputation: 3929
With so many nested structures, you should use marshmallow. It is built with your use case in mind. Please check out the excellent documentation: https://marshmallow.readthedocs.io/en/stable/ . All you need is the masic usage.
It is a lot of code, but better be explicit than clever. I am sure a shorter solution exists, but it is probably unmaintainable. Also I had to build your dataframe. Please provide it in a data format next time.
import pandas as pd
import marshmallow as ma
# build test data
df = pd.DataFrame.from_records([
['InvoiceDate', 'MYSQL', 'Invc_Dt', 'varchar', 'SalesDataExtract'],
['InvoiceDate', 'CSV', 'Invc_Date', 'string', 'SalesDataExtract'],
['Description', 'MYSQL', 'Prod_Descr', 'varchar', 'SalesDataExtract'],
['Description', 'CSV', 'Prod_Descr', 'string', 'SalesDataExtract'],
['InvoiceDate', 'MYSQL', 'Date', 'varchar', 'DateDataExtract'],]
)
df.columns = ['BusinessName', 'SourceSystem', 'FunctionalName', 'DataType', 'DataExtractName']
# define marshmallow schemas
class SourceSystemTypeSchema(ma.Schema):
MYSQL = ma.fields.String()
CSV = ma.fields.String()
class DataTypeSchema(ma.Schema):
MYSQL = ma.fields.String()
CSV = ma.fields.String()
class InvoiceDateSchema(ma.Schema):
InvoiceDate = ma.fields.Nested(SourceSystemTypeSchema())
DataType = ma.fields.Nested(DataTypeSchema())
class DescriptionSchema(ma.Schema):
SourceSystem = ma.fields.Nested(SourceSystemTypeSchema())
DataType = ma.fields.Nested(DataTypeSchema())
class BusinessNameSchema(ma.Schema):
InvoiceDate = ma.fields.Nested(InvoiceDateSchema())
Description = ma.fields.Nested(DescriptionSchema())
class DataSchema(ma.Schema):
DataExtractName = ma.fields.String()
BusinessName = ma.fields.Nested(BusinessNameSchema())
# building json
result = []
mask_business_name_invoicedate = df.BusinessName == 'InvoiceDate'
mask_business_name_description = df.BusinessName == 'Description'
for data_extract_name in set(df['DataExtractName'].to_list()):
mask_data_extract_name = df.DataExtractName == data_extract_name
# you need these two helper dfs to get the dictionaries
df_source_system = df[mask_data_extract_name & mask_business_name_invoicedate].set_index('SourceSystem').to_dict(orient='dict')
df_description = df[mask_data_extract_name & mask_business_name_description].set_index('SourceSystem').to_dict(orient='dict')
# all dictionaries are defined, so you can use your schemas
source_system_type = SourceSystemTypeSchema().dump(df_source_system['FunctionalName'])
data_type = DataTypeSchema().dump(df_source_system['DataType'])
source_system = SourceSystemTypeSchema().dump(df_description['FunctionalName'])
invoice_date = InvoiceDateSchema().dump({'SourceSystemType': source_system_type, 'DataType': data_type})
description = DescriptionSchema().dump({'SourceSystem': source_system, 'DataType': data_type})
business_name = BusinessNameSchema().dump({'InvoiceDate': invoice_date, 'Description': description})
data = DataSchema().dump({'DataExtractName': data_extract_name, 'BusinessName': business_name})
# end result
result.append(data)
Now,
ma.pprint(result)
returns
[{'BusinessName': {'Description': {'DataType': {'CSV': 'string',
'MYSQL': 'varchar'},
'SourceSystem': {'CSV': 'Prod_Descr',
'MYSQL': 'Prod_Descr'}},
'InvoiceDate': {'DataType': {'CSV': 'string',
'MYSQL': 'varchar'}}},
'DataExtractName': 'SalesDataExtract'},
{'BusinessName': {'Description': {'DataType': {'MYSQL': 'varchar'},
'SourceSystem': {}},
'InvoiceDate': {'DataType': {'MYSQL': 'varchar'}}},
'DataExtractName': 'DateDataExtract'}]
Upvotes: 2