Reputation: 824
I have this small dataframe:
SUB_PROCEDURE ZFIELD ZFIELD2 ZVALUE_ENABLED
----------------------------------------------
SP1 A ZF1 ZV1
SP1 A ZF2 ZV2
SP1 B ZF3 ZV3
SP1 B ZF4 ZV4
SP2 A NaN ZV5
SP2 A NaN ZV6
I would like to obtain this dictionary from it:
desired_output = {
'A': {
'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'},
'SP2': ['ZV5', 'ZV6']
},
'B': {
'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'},
'SP2': None
}
}
The rules would be:
Create a key value pair for each distinct zfield in the dataframe. See rule 2 for the value.
In the above value, create a key value pair for each sub_procedure. See rule 3 for the value.
If ZFIELD2 is NaN, then concatenate in a list the different ZVALUE_ENABLED. If ZFIELD2 is not NaN, then create a key value pair for each combination of ZFIELD2 and ZVALUE_ENABLED.
If there's no entry for that ZFIELD and SUB_PROCEDURE for SP1 or SP2 (configurable), then display key: None.
So far, I've been trying with default dictionaries and this [answer] (Pandas dataframe to dictionary with conditions)
Upvotes: 1
Views: 466
Reputation:
You can convert the DataFrame values to a numpy array and iterate over them in a for loop. In the loop using dict.setdefault
, collect the row items in a nested dictionary structure.
out = {}
for w, x, y, z in df.iloc[:,[1,0,2,3]].to_numpy():
out.setdefault(w, {})
if pd.notna(y):
out[w].setdefault(x, {})[y] = z
else:
out[w].setdefault(x, []).append(z)
This produces
{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'}, 'SP2': ['ZV5', 'ZV6']},
'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'}}}
Now you want to add SP2
key inside B
key. To do that, iterate over the unique values of df['SUB_PROCEDURE']
and for each key-value pair in out
see if a sub-procedure key is missing and if it is, add that key in.
for sp in df['SUB_PROCEDURE'].unique():
for k, v in out.items():
v.setdefault(sp, None)
Final output:
{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'}, 'SP2': ['ZV5', 'ZV6']},
'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'}, 'SP2': None}}
Upvotes: 1
Reputation: 862581
Solution with custom function for dictionaries or lists if all values in ZFIELD2
are missing values, for add None
s is used Series.unstack
, replaced NaN
s to None
s in DataFrame.mask
and last is used DataFrame.to_dict
:
def f(x):
return dict(x[['ZFIELD2','ZVALUE_ENABLED']].to_numpy())
if x['ZFIELD2'].notna().all()
else x['ZVALUE_ENABLED'].tolist()
df = df.groupby(['SUB_PROCEDURE','ZFIELD']).apply(f).unstack()
out = df.mask(df.isna(), None).to_dict()
print (out)
{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'},
'SP2': ['ZV5', 'ZV6']},
'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'},
'SP2': None}}
Upvotes: 1