Replacing values in large number of columns with another column value based on a condition

Question

I have this data:

id   |  d1   |  d2  |  d3  | .... |  d64   | FINAL_GRADE
1    |  0    |  15  |  0   | .... |  23    | 95
2    |  8    |  0   |  12  | .... |  0     | 75

And I want to replace all non-zero values in each row with the corresponding value in the FINAL_GRADE column, and obtain this table:

id   |  d1   |  d2  |  d3  | .... |  d64   | FINAL_GRADE
1    |  0    |  95  |  0   | .... |  95    | 95
2    |  75   |  0   |  75  | .... |  0     | 75

Here is my code:

df[df.ix[:, 1:63] != 0] = df['FINAL_GRADE']

But, I am receiving this error: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

I wonder if my code has any issues? Or my approach is totally wrong. I appreciate any help!

Nickil Maveli · Accepted Answer

One possibility would be to use DF.mask() method on the created boolean mask.

Using .ix for setting values fails here presumably because you're operating on the subset of columns having mixed dtypes (float - due to the generation of the bool mask and subsetting operation & int - values in FINAL_GRADE to be looked up at).

This would be the main cause of the TypeError getting generated.

Steps:

1) Subset the dataframe by selecting the columns starting with the char d using str.startswith.

2) Using DF.mask for the conditions wherein the values in this subset are non-zero, we replace them with the contents present in FINAL_GRADE row-wise by specifying axis=0.

3) Finally, concatenate the id, FINAL_GRADE and the masked DF column-wise using pd.concat(axis=1)

sub_df = df[df.columns[df.columns.str.startswith('d')]]
mask_df = sub_df.mask(sub_df != 0, df['FINAL_GRADE'], axis=0)
pd.concat([df['id'], mask_df, df['FINAL_GRADE']], axis=1)

Replacing values in large number of columns with another column value based on a condition

Answers (2)

Related Questions