Reputation: 125
I am trying to create a dictionary where the value for each key is two dictionaries.
I have two lists of patient (normal tissue, disease tissue) barcodes that correspond to columns of values in a dataframe. My goal is to match patients that are in both lists and then, for each patient found in both lists, append their normal and disease tissue values to a dictionary. The dictionary key would be the patient barcode and the dictionary value would be another dictionary of the normal tissue: values pulled from the dataframe and disease tissue: values pulled from the dataframe.
So starting with
In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan'],
'Patient1_Disease':[0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'Patient2_Disease':['nan', 'nan', 'nan', 1.0, 0.24, 0.67, 0.97, 0.98],
'Patient3_Normal': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9],
'Patient3_Disease':[0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'Patient4_Normal':['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91],
'Patient4_Disease':['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'Patient5_Disease': [0.34, 0.27, 'nan', 0.16, 0.32, 0.27, 0.55, 0.51]})
In [4]: df
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
0 nan 0.12 nan 0.21 0.11 nan nan 0.34
1 0.01 0.06 nan 0.25 0.45 0.35 nan 0.27
2 0.1 0.19 nan 0.63 nan nan 0.56 nan
3 0.16 0.34 1 0.92 0.45 0.22 0.72 0.16
4 0.88 nan 0.24 0.30 0.22 0.45 nan 0.32
5 0.83 nan 0.67 0.56 0.89 0.66 0.97 0.27
6 0.82 0.73 0.97 0.78 0.17 0.21 0.91 0.55
7 nan 0.91 0.98 0.90 0.12 0.91 0.79 0.51
Here is what I have so far:
D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []
for patient in N_col:
patient_id = patient[0:8]
n_id = patient
d_id = [i for i in D_col if patient_id in i]
if len(d_id) > 0:
psi_sets['n'] = df[n_id].to_list()
for d in d_id:
psi_sets['d'] = df[d].to_list()
paired_patients[patient_id] = psi_sets
However, my paired_patients
dictionary values are overwriting instead of appending, so the output for paired_patients
looks like this:
{'Patient1': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
'Patient3': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
'Patient4': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
How do I fix the last bit of code to append paired_patient
dictionary values correctly for each patient, such that the paired_patient
dictionary looks like:
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
Upvotes: 0
Views: 110
Reputation: 15872
You can use df.melt
, pd.concat
, series.str.split
, df.replace
, df.groupby
and df.xs
and then finally df.to_dict
.
Please check out following:
>>> df2 = (pd.concat([
df.melt().variable.str.split('_', expand=True),
df.melt().drop('variable',1)
], axis=1)
.replace({'Normal':'n', 'Disease':'d'})
.groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k, v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if not ({'d', 'n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
EXPLANTION:
>>> df.melt()
variable value
0 Patient1_Normal NaN
1 Patient1_Normal 0.01
2 Patient1_Normal 0.10
.. ... ...
62 Patient5_Disease 0.55
63 Patient5_Disease 0.51
>>> df.melt().variable.str.split('_', expand=True)
0 1
0 Patient1 Normal
1 Patient1 Normal
2 Patient1 Normal
.. ... ...
62 Patient5 Disease
63 Patient5 Disease
[64 rows x 2 columns]
# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
df.melt().variable.str.split('_', expand=True),
df.melt().drop('variable',1)
], axis=1).replace({'Normal':'n', 'Disease':'d'})
0 1 value
0 Patient1 n NaN
1 Patient1 n 0.01
2 Patient1 n 0.10
.. ... .. ...
62 Patient5 d 0.55
63 Patient5 d 0.51
[64 rows x 3 columns]
# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
value
0 1
Patient1 d [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
n [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
n [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
n [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]
# Now groupby level=0, and convert that into dict, and finally check whether
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects
>>> paired_patients = {k: v for k, v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if ('n' in v) and ('d' in v)}
Upvotes: 1
Reputation: 4199
D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}
for patient in N_col:
psi_sets = {}
patient_id = patient[0:8]
n_id = patient
d_id = [i for i in D_col if patient_id in i]
if len(d_id) > 0:
psi_sets['n'] = df[n_id].to_list()
for d in d_id:
psi_sets['d'] = df[d].to_list()
paired_patients[patient_id] = psi_sets
Upvotes: 1