Javi Torre
Javi Torre

Reputation: 824

DataFrame to dictionary with conditions

I have this small dataframe:

SUB_PROCEDURE   ZFIELD  ZFIELD2 ZVALUE_ENABLED
----------------------------------------------
          SP1        A      ZF1            ZV1
          SP1        A      ZF2            ZV2
          SP1        B      ZF3            ZV3
          SP1        B      ZF4            ZV4
          SP2        A      NaN            ZV5
          SP2        A      NaN            ZV6

I would like to obtain this dictionary from it:

desired_output = {
    'A': {
        'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'},
        'SP2': ['ZV5', 'ZV6']
    },
    'B': {
        'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'},
        'SP2': None
    }
}

The rules would be:

  1. Create a key value pair for each distinct zfield in the dataframe. See rule 2 for the value.

  2. In the above value, create a key value pair for each sub_procedure. See rule 3 for the value.

  3. If ZFIELD2 is NaN, then concatenate in a list the different ZVALUE_ENABLED. If ZFIELD2 is not NaN, then create a key value pair for each combination of ZFIELD2 and ZVALUE_ENABLED.

  4. If there's no entry for that ZFIELD and SUB_PROCEDURE for SP1 or SP2 (configurable), then display key: None.

So far, I've been trying with default dictionaries and this [answer] (Pandas dataframe to dictionary with conditions)

Upvotes: 1

Views: 466

Answers (2)

user7864386
user7864386

Reputation:

You can convert the DataFrame values to a numpy array and iterate over them in a for loop. In the loop using dict.setdefault, collect the row items in a nested dictionary structure.

out = {}
for w, x, y, z in df.iloc[:,[1,0,2,3]].to_numpy():
    out.setdefault(w, {})
    if pd.notna(y):
        out[w].setdefault(x, {})[y] = z
    else:
        out[w].setdefault(x, []).append(z)

This produces

{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'}, 'SP2': ['ZV5', 'ZV6']},
 'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'}}}

Now you want to add SP2 key inside B key. To do that, iterate over the unique values of df['SUB_PROCEDURE'] and for each key-value pair in out see if a sub-procedure key is missing and if it is, add that key in.

for sp in df['SUB_PROCEDURE'].unique():
    for k, v in out.items():
        v.setdefault(sp, None)

Final output:

{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'}, 'SP2': ['ZV5', 'ZV6']},
 'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'}, 'SP2': None}}

Upvotes: 1

jezrael
jezrael

Reputation: 862581

Solution with custom function for dictionaries or lists if all values in ZFIELD2 are missing values, for add Nones is used Series.unstack, replaced NaNs to Nones in DataFrame.mask and last is used DataFrame.to_dict:

def f(x):
    return dict(x[['ZFIELD2','ZVALUE_ENABLED']].to_numpy()) 
           if x['ZFIELD2'].notna().all() 
           else x['ZVALUE_ENABLED'].tolist()

df = df.groupby(['SUB_PROCEDURE','ZFIELD']).apply(f).unstack()
out = df.mask(df.isna(), None).to_dict()
print (out)
{'A': {'SP1': {'ZF1': 'ZV1', 'ZF2': 'ZV2'},
       'SP2': ['ZV5', 'ZV6']}, 
 'B': {'SP1': {'ZF3': 'ZV3', 'ZF4': 'ZV4'},
       'SP2': None}}

Upvotes: 1

Related Questions