Pandas : Fill null in multiple columns only if another column is null

Question

I have a data frame which has 86 columns. There are columns with prefixes such as name_smt1, name_smt2, ..., status_smt1, status_smt2,..., grade_smt1, grade_smt2,... and so. Other columns are subjects, and there are more than 40 subjects columns with different names. I also have this one column named grade_t which I use as a parameter whether I will fill all the subjects columns those contain with D or no. If the column grade_t in a row is null then all of subjects columns (which is null) will be filled with D. I'm trying to do it like this but it raised error saying ValueError: shape mismatch: value array of shape (7,4) could not be broadcast to indexing result of shape (7,) is there any way to do this without the code below? I have been working on this for 2 days, I have tried looping over it but it resulting in filling all of subjects with D even when the IP_t is not null

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

Full code

merged_df['TARGET'] = merged_df['TARGET'].fillna(value='TIDAK LULUS')

list_nama_prefix = [col for col in merged_df.columns if 'NAMA_' in col and not 'NAMA_smt1' in col]

merged_df = merged_df.drop(list_nama_prefix,1)
merged_df = merged_df.rename(columns={
    'NAMA_smt1' : 'NAMA'
})

list_ip = [col for col in merged_df.columns if 'IP_' in col]
smt_sebelumnya_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEBELUMNYA_' in col]
smt_skrg_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEKARANG_' in col]
status_sp_cols = [col for col in merged_df.columns if 'Status SP_' in col]
statuses = smt_sebelumnya_cols+status_sp_cols+smt_skrg_cols
matkul = merged_df.select_dtypes(include=['object']).drop(statuses,1).columns.tolist()
list_matkul = [i for i in matkul if i not in ('NIM', 'NAMA','TARGET')]

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

My data if you'd like to see

Dom · Accepted Answer

So if I understand your problem correctly, given input

   grade_t subject_1 subject_2  other
0      NaN        A+       NaN    1.0
1      1.0        B-         B    NaN
2      NaN       NaN       NaN    NaN
3      1.0       NaN         A    4.0

You want the output to be

   grade_t subject_1 subject_2  other
0      NaN         D         D    1.0
1      1.0        B-         B    NaN
2      NaN         D         D    NaN
3      1.0       NaN         A    4.0

If so I think this is done most easily using the mask method:

mask = data['grade_t'].isna()
subject_columns = ['subject_1', 'subject_2']
data[subject_columns] = data[subject_columns].mask(mask, other='D', axis=0)

df.mask(cond, other) takes a boolean mask cond and value other, and replaces the values of df with other wherever cond is True, and retains the original value from df everywhere else.

In general, cond can be a DataFrame of the same shape as data, or a Series whose index matches the index or columns of df, in which case you should specify the axis argument (as in the snippet above).

Pandas : Fill null in multiple columns only if another column is null

Answers (1)

Related Questions