Reputation: 137
I have a data frame which has 86 columns. There are columns with prefixes such as name_smt1, name_smt2, ..., status_smt1, status_smt2,..., grade_smt1, grade_smt2,...
and so. Other columns are subjects, and there are more than 40 subjects columns with different names. I also have this one column named grade_t
which I use as a parameter whether I will fill all the subjects columns those contain with D
or no. If the column grade_t
in a row is null then all of subjects columns (which is null) will be filled with D
. I'm trying to do it like this but it raised error saying ValueError: shape mismatch: value array of shape (7,4) could not be broadcast to indexing result of shape (7,)
is there any way to do this without the code below? I have been working on this for 2 days, I have tried looping over it but it resulting in filling all of subjects with D
even when the IP_t
is not null
merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')
Full code
merged_df['TARGET'] = merged_df['TARGET'].fillna(value='TIDAK LULUS')
list_nama_prefix = [col for col in merged_df.columns if 'NAMA_' in col and not 'NAMA_smt1' in col]
merged_df = merged_df.drop(list_nama_prefix,1)
merged_df = merged_df.rename(columns={
'NAMA_smt1' : 'NAMA'
})
list_ip = [col for col in merged_df.columns if 'IP_' in col]
smt_sebelumnya_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEBELUMNYA_' in col]
smt_skrg_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEKARANG_' in col]
status_sp_cols = [col for col in merged_df.columns if 'Status SP_' in col]
statuses = smt_sebelumnya_cols+status_sp_cols+smt_skrg_cols
matkul = merged_df.select_dtypes(include=['object']).drop(statuses,1).columns.tolist()
list_matkul = [i for i in matkul if i not in ('NIM', 'NAMA','TARGET')]
merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')
My data if you'd like to see
Upvotes: 0
Views: 243
Reputation: 300
So if I understand your problem correctly, given input
grade_t subject_1 subject_2 other
0 NaN A+ NaN 1.0
1 1.0 B- B NaN
2 NaN NaN NaN NaN
3 1.0 NaN A 4.0
You want the output to be
grade_t subject_1 subject_2 other
0 NaN D D 1.0
1 1.0 B- B NaN
2 NaN D D NaN
3 1.0 NaN A 4.0
If so I think this is done most easily using the mask
method:
mask = data['grade_t'].isna()
subject_columns = ['subject_1', 'subject_2']
data[subject_columns] = data[subject_columns].mask(mask, other='D', axis=0)
df.mask(cond, other)
takes a boolean mask cond
and value other
, and replaces the values of df
with other
wherever cond
is True, and retains the original value from df
everywhere else.
In general, cond
can be a DataFrame of the same shape as data
, or a Series whose index matches the index or columns of df
, in which case you should specify the axis
argument (as in the snippet above).
Upvotes: 3