Reputation: 340
I am new to python I have a data frame with different groups and titles. Now I want to add a column based on median for each group (grp_pred), but I am not sure how I can accomplish this. This is how my df looks like
df
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
median_dict =
{'18-34': 0.395992275,
'18-54': 0.79392129200000006,
'25-54': 0.64958055850000007,
'M18-34': 0.1171878905,
'M18-54': 0.27340067349999997,
'M25-54': 0.23422200100000001,
'V18-34': 0.2283782815,
'V18-54': 0.4497918595,
'V25-54': 0.37749252799999999}
required output
so basically I want to compare median values store in the dictionary across each title and then assign to a certain group if the value is equal to that specific median. e.g say if the median is 0.395992275 then pred_grp is 18-24 and so forth
df_out
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 18-54 V18-54 M18-54 pred_grp
HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393 0.739303 0.380100 0.344279 18-54
MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374 0.880062 0.503115 0.352025
PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093 0.717562 0.384010 0.316514
RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674 0.838560 0.548663 0.255410
ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934 0.877049 0.557377 0.303279
How would appreciate your help!!
Thanks in advance
Upvotes: 1
Views: 311
Reputation: 75080
Based on what I understood from comments , you can try creating a df of same structure from the dictionary as the input dataframe and then get the column which has the least difference:
u = df.set_index("title")
v = pd.DataFrame.from_dict(median_dict,orient='index').T.reindex(u.columns,axis=1)
df['pred_group'] = (u - v.to_numpy()).idxmin(axis=1).to_numpy()
print(df)
title M18-34 V18-34 18-34 25-54 V25-54 M25-54 \
0 HEPEN 0.102488 0.200995 0.312438 0.667662 0.334328 0.321393
1 MATED 0.151090 0.208723 0.361371 0.733645 0.428349 0.280374
2 PEERT 0.098296 0.157929 0.262779 0.624509 0.325033 0.283093
3 RZOEK 0.143695 0.336882 0.503607 0.657216 0.414844 0.214674
4 ERKEN 0.204918 0.409836 0.631148 0.467213 0.286885 0.163934
18-54 V18-54 M18-54 pred_group
0 0.739303 0.380100 0.344279 18-34
1 0.880062 0.503115 0.352025 18-34
2 0.717562 0.384010 0.316514 18-34
3 0.838560 0.548663 0.255410 M25-54
4 0.877049 0.557377 0.303279 25-54
Upvotes: 2