Reputation: 528
I'm comparing the (sub)string of two columns from a data frame.
By following the suggestions from this thread I'm able now to put the condition TRUE if there is a correspondence between the column manual_label
and prediction
and FALSE if there is no correspondence.
Here a snapshot of the code I used:
argument_cols = ['prediction']
boolean_idx = df[argument_cols].apply(
lambda arg_column: df['manual_label'].combine(arg_column, lambda token, arg: token in arg)
)
df['boolean_idx'] = boolean_idx
pd.options.display.width=None
print(df)
df.to_csv('csv_file_w_pred.csv',sep=',',index=False)
The resulting data frame looks like this:
subject manual_label prediction value boolean_idx
A __label__Religione_e_Magia (__label__Bibbia_storie_dal_Vecchio_e_dal_Nuov... ... False
B __label__Religione_e_Magia (__label__Religione_e_Magia,__label__Storia) ... True
C __label__Mitologia_classica_e_storia_antica (__label__Societa_civilizzazione_cultura,) ... False
D __label__Essere_umano_uomo_in_generale (__label__Essere_umano_uomo_in_generale,) ... True
E __label__Religione_e_Magia (__label__Religione_e_Magia,) ... True
The column prediction
can have multiple labels.
However, if the condition is True
I would like to have the value that satisfies the condition and if the condition is False
I would like to have the 1st value from the col prediction
which, as I said before can have multiple labels.
Desired output:
subject manual_label prediction value boolean_idx
A __label__Religione_e_Magia (__label__Bibbia_storie_dal_Vecchio_e_dal_Nuov... ... __label__Bibbia_storie_dal_Vecchio_e_dal_Nuovo_Testamento
B __label__Religione_e_Magia (__label__Religione_e_Magia,__label__Storia) ... __label__Religione_e_Magia
C __label__Mitologia_classica_e_storia_antica (__label__Societa_civilizzazione_cultura,) ... __label__Societa_civilizzazione_cultura
D __label__Essere_umano_uomo_in_generale (__label__Essere_umano_uomo_in_generale,) ... __label__Essere_umano_uomo_in_generale
E __label__Religione_e_Magia (__label__Religione_e_Magia,) ... __label__Religione_e_Magia
Suggestions?
Regards
Upvotes: 1
Views: 567
Reputation: 528
If anyone needs I solved the issues as follows
argument_cols = ['label_1','label_2','label_3']
boolean_idx = df[argument_cols].apply(
lambda arg_column: df['manual_label'].combine(arg_column, lambda token, arg: token in arg)
)
selected_vals = df[argument_cols][boolean_idx]
selected_vals = selected_vals.replace(np.nan, '', regex=True)
selected_vals = selected_vals.applymap(str)
df['suggested_label'] = selected_vals["label_1"].astype(str) + selected_vals["label_2"]+ selected_vals["label_3"]
df = df.replace(r'^\s*$', np.nan, regex=True)
df.loc[df['suggested_label'].isnull(),'suggested_label'] = df['label_1']
print(df)
Upvotes: 1