user3459293
user3459293

Reputation: 340

Add new label column based on values in dictionary in python

I am new to python I have a data frame with different groups and titles. Now I want to add a column based on median for each group (grp_pred), but I am not sure how I can accomplish this. This is how my df looks like

df
    title   M18-34       V18-34       18-34      25-54       V25-54      M25-54       18-54      V18-54      M18-54
    HEPEN   0.102488    0.200995    0.312438    0.667662    0.334328    0.321393    0.739303    0.380100    0.344279
    MATED   0.151090    0.208723    0.361371    0.733645    0.428349    0.280374    0.880062    0.503115    0.352025
    PEERT   0.098296    0.157929    0.262779    0.624509    0.325033    0.283093    0.717562    0.384010    0.316514
    RZOEK   0.143695    0.336882    0.503607    0.657216    0.414844    0.214674    0.838560    0.548663    0.255410
    ERKEN   0.204918    0.409836    0.631148    0.467213    0.286885    0.163934    0.877049    0.557377    0.303279

median_dict = 

{'18-34': 0.395992275,
 '18-54': 0.79392129200000006,
 '25-54': 0.64958055850000007,
 'M18-34': 0.1171878905,
 'M18-54': 0.27340067349999997,
 'M25-54': 0.23422200100000001,
 'V18-34': 0.2283782815,
 'V18-54': 0.4497918595,
 'V25-54': 0.37749252799999999}

required output so basically I want to compare median values store in the dictionary across each title and then assign to a certain group if the value is equal to that specific median. e.g say if the median is 0.395992275 then pred_grp is 18-24 and so forth

 df_out
        title   M18-34       V18-34       18-34      25-54       V25-54      M25-54        18-54      V18-54      M18-54  pred_grp
        HEPEN   0.102488    0.200995    0.312438    0.667662    0.334328    0.321393    0.739303    0.380100    0.344279 18-54
        MATED   0.151090    0.208723    0.361371    0.733645    0.428349    0.280374    0.880062    0.503115    0.352025
        PEERT   0.098296    0.157929    0.262779    0.624509    0.325033    0.283093    0.717562    0.384010    0.316514
        RZOEK   0.143695    0.336882    0.503607    0.657216    0.414844    0.214674    0.838560    0.548663    0.255410
        ERKEN   0.204918    0.409836    0.631148    0.467213    0.286885    0.163934    0.877049    0.557377    0.303279

How would appreciate your help!!

Thanks in advance

Upvotes: 1

Views: 311

Answers (1)

anky
anky

Reputation: 75080

Based on what I understood from comments , you can try creating a df of same structure from the dictionary as the input dataframe and then get the column which has the least difference:

u = df.set_index("title")
v = pd.DataFrame.from_dict(median_dict,orient='index').T.reindex(u.columns,axis=1)
df['pred_group'] = (u - v.to_numpy()).idxmin(axis=1).to_numpy()

print(df)

   title    M18-34    V18-34     18-34     25-54    V25-54    M25-54  \
0  HEPEN  0.102488  0.200995  0.312438  0.667662  0.334328  0.321393   
1  MATED  0.151090  0.208723  0.361371  0.733645  0.428349  0.280374   
2  PEERT  0.098296  0.157929  0.262779  0.624509  0.325033  0.283093   
3  RZOEK  0.143695  0.336882  0.503607  0.657216  0.414844  0.214674   
4  ERKEN  0.204918  0.409836  0.631148  0.467213  0.286885  0.163934   

      18-54    V18-54    M18-54 pred_group  
0  0.739303  0.380100  0.344279      18-34  
1  0.880062  0.503115  0.352025      18-34  
2  0.717562  0.384010  0.316514      18-34  
3  0.838560  0.548663  0.255410     M25-54  
4  0.877049  0.557377  0.303279      25-54 

Upvotes: 2

Related Questions