Emerson
Emerson

Reputation: 125

How to append dictionaries to a dictionary in a for loop?

I am trying to create a dictionary where the value for each key is two dictionaries.

I have two lists of patient (normal tissue, disease tissue) barcodes that correspond to columns of values in a dataframe. My goal is to match patients that are in both lists and then, for each patient found in both lists, append their normal and disease tissue values to a dictionary. The dictionary key would be the patient barcode and the dictionary value would be another dictionary of the normal tissue: values pulled from the dataframe and disease tissue: values pulled from the dataframe.

So starting with

In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan'],
                 'Patient1_Disease':[0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
                 'Patient2_Disease':['nan', 'nan', 'nan', 1.0, 0.24, 0.67, 0.97, 0.98],
                 'Patient3_Normal': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9],
                 'Patient3_Disease':[0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
                 'Patient4_Normal':['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91],
                 'Patient4_Disease':['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
                 'Patient5_Disease': [0.34, 0.27, 'nan', 0.16, 0.32, 0.27, 0.55, 0.51]})


In [4]: df                                                                                                                                 
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease  Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
    0             nan             0.12              nan             0.21             0.11             nan              nan             0.34
    1            0.01             0.06              nan             0.25             0.45            0.35              nan             0.27
    2             0.1             0.19              nan             0.63              nan             nan             0.56              nan
    3            0.16             0.34                1             0.92             0.45            0.22             0.72             0.16
    4            0.88              nan             0.24             0.30             0.22            0.45              nan             0.32
    5            0.83              nan             0.67             0.56             0.89            0.66             0.97             0.27
    6            0.82             0.73             0.97             0.78             0.17            0.21             0.91             0.55
    7             nan             0.91             0.98             0.90             0.12            0.91             0.79             0.51

Here is what I have so far:

D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]

paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []

for patient in N_col:
       patient_id = patient[0:8]

       n_id = patient
       d_id = [i for i in D_col if patient_id in i]

       if len(d_id) > 0:
           psi_sets['n'] = df[n_id].to_list()
           for d in d_id:
               psi_sets['d'] = df[d].to_list()

       paired_patients[patient_id] = psi_sets

However, my paired_patients dictionary values are overwriting instead of appending, so the output for paired_patients looks like this:

{'Patient1': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient3': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient4': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

How do I fix the last bit of code to append paired_patient dictionary values correctly for each patient, such that the paired_patient dictionary looks like:

{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

Upvotes: 0

Views: 110

Answers (2)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can use df.melt, pd.concat, series.str.split, df.replace, df.groupby and df.xs and then finally df.to_dict. Please check out following:

>>> df2 = (pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1)
                       .replace({'Normal':'n', 'Disease':'d'})
                       .groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if not ({'d', 'n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

EXPLANTION:

>>> df.melt()
            variable  value
0    Patient1_Normal    NaN
1    Patient1_Normal   0.01
2    Patient1_Normal   0.10
..               ...    ...
62  Patient5_Disease   0.55
63  Patient5_Disease   0.51

>>> df.melt().variable.str.split('_', expand=True)
 
           0        1
0   Patient1   Normal
1   Patient1   Normal
2   Patient1   Normal
..       ...      ...
62  Patient5  Disease
63  Patient5  Disease

[64 rows x 2 columns]

# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1).replace({'Normal':'n', 'Disease':'d'})
           0  1  value
0   Patient1  n    NaN
1   Patient1  n   0.01
2   Patient1  n   0.10
..       ... ..    ...
62  Patient5  d   0.55
63  Patient5  d   0.51

[64 rows x 3 columns]

# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
                                                      value
0        1                                                 
Patient1 d   [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
         n    [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d     [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d  [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
         n   [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d    [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
         n   [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d  [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]

# Now groupby level=0, and convert that into dict, and finally check whether 
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects

>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if ('n' in v) and ('d' in v)}

Upvotes: 1

ComplicatedPhenomenon
ComplicatedPhenomenon

Reputation: 4199

D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}


for patient in N_col:
    psi_sets = {}
    patient_id = patient[0:8]
    n_id = patient
    d_id = [i for i in D_col if patient_id in i]

    if len(d_id) > 0:
        psi_sets['n'] = df[n_id].to_list()
        for d in d_id:
            psi_sets['d'] = df[d].to_list()
 
    paired_patients[patient_id] = psi_sets

Upvotes: 1

Related Questions